MMLU-Pro

General knowledge% accuracy

Harder reformulation of MMLU with 10 answer choices and deeper reasoning.

At a glance

🏆 Top score
o3OpenAI81.2 % accuracy
Total results
7
Models tested
7
Providers
5
Verified · Self-reported
2 · 5
Average
75.54 % accuracy
Median
76 % accuracy
Range
63.1 – 81.2 % accuracy
Latest result
Jun 1, 2025

Score distribution

1
0
0
0
0
0
0
4
1
1
63.172.281.2
7 results across 10 score bands

Methodology

10-way multiple choice, substantially reduced prompt-sensitivity vs MMLU.

Limitations

Newer; fewer published numbers than MMLU. Still a multiple-choice exam.

By provider

  • OpenAI· 3 models
    81.2 % accuracy
    o3
    Average: 73.43 % accuracyBest: 81.2 % accuracy
  • xAI· 1 model
    79.3 % accuracy
    Grok 3
    Average: 79.3 % accuracyBest: 79.3 % accuracy
  • Anthropic· 1 model
    77.5 % accuracy
    Claude Opus 4
    Average: 77.5 % accuracyBest: 77.5 % accuracy
  • DeepSeek· 1 model
    75.9 % accuracy
    DeepSeek V3
    Average: 75.9 % accuracyBest: 75.9 % accuracy
  • Google· 1 model
    75.8 % accuracy
    Gemini 2 Pro
    Average: 75.8 % accuracyBest: 75.8 % accuracy

Full leaderboard

Showing 7 of 7
#ModelProviderScore (% accuracy)
1o3OpenAI
81.2
2Grok 3xAI
79.3
3Claude Opus 4Anthropic
77.5
4GPT-5OpenAI
76
5DeepSeek V3DeepSeek
75.9
6Gemini 2 ProGoogle
75.8
7GPT-4o miniOpenAI
63.1

Comments

Sign in to leave a comment.