MMLU-Pro
General knowledge% accuracyHarder reformulation of MMLU with 10 answer choices and deeper reasoning.
At a glance
Total results
7
Models tested
7
Providers
5
Verified · Self-reported
2 · 5
Average
75.54 % accuracy
Median
76 % accuracy
Range
63.1 – 81.2 % accuracy
Latest result
Jun 1, 2025
Score distribution
1
0
0
0
0
0
0
4
1
1
63.172.281.2
7 results across 10 score bands
Methodology
10-way multiple choice, substantially reduced prompt-sensitivity vs MMLU.
Limitations
Newer; fewer published numbers than MMLU. Still a multiple-choice exam.
By provider
- Average: 73.43 % accuracyBest: 81.2 % accuracy
- Average: 79.3 % accuracyBest: 79.3 % accuracy
- Anthropic· 1 model77.5 % accuracyClaude Opus 4Average: 77.5 % accuracyBest: 77.5 % accuracy
- DeepSeek· 1 model75.9 % accuracyDeepSeek V3Average: 75.9 % accuracyBest: 75.9 % accuracy
- Google· 1 model75.8 % accuracyGemini 2 ProAverage: 75.8 % accuracyBest: 75.8 % accuracy
Full leaderboard
Showing 7 of 7| # | Model | Provider | Score (% accuracy) |
|---|---|---|---|
| 1 | o3 | OpenAI | 81.2 |
| 2 | Grok 3 | xAI | 79.3 |
| 3 | Claude Opus 4 | Anthropic | 77.5 |
| 4 | GPT-5 | OpenAI | 76 |
| 5 | DeepSeek V3 | DeepSeek | 75.9 |
| 6 | Gemini 2 Pro | 75.8 | |
| 7 | GPT-4o mini | OpenAI | 63.1 |
Comments
Sign in to leave a comment.