GSM8K
Math% accuracyGrade-school math word problems.
At a glance
🏆 Top score
Total results
8
Models tested
8
Providers
6
Verified · Self-reported
0 · 8
Average
95.25 % accuracy
Median
95.6 % accuracy
Range
93 – 97.1 % accuracy
Latest result
Jun 1, 2025
Score distribution
2
0
0
1
0
1
1
1
0
2
93.095.097.1
8 results across 10 score bands
Methodology
8.5k grade-school math word problems; final numeric answer is checked.
Limitations
Mostly saturated on frontier models. Low headroom for differentiation.
By provider
- DeepSeek· 1 model97.1 % accuracyDeepSeek V3Average: 97.1 % accuracyBest: 97.1 % accuracy
- Average: 95.43 % accuracyBest: 97 % accuracy
- Average: 95.8 % accuracyBest: 95.8 % accuracy
- Anthropic· 1 model95.4 % accuracyClaude Opus 4Average: 95.4 % accuracyBest: 95.4 % accuracy
- Google· 1 model94.4 % accuracyGemini 2 ProAverage: 94.4 % accuracyBest: 94.4 % accuracy
- Meta· 1 model93 % accuracyLlama 3 70BAverage: 93 % accuracyBest: 93 % accuracy
Full leaderboard
Showing 8 of 8| # | Model | Provider | Score (% accuracy) |
|---|---|---|---|
| 1 | DeepSeek V3 | DeepSeek | 97.1 |
| 2 | o3-mini | OpenAI | 97 |
| 3 | GPT-5 | OpenAI | 96.1 |
| 4 | Phi-4 | Microsoft | 95.8 |
| 5 | Claude Opus 4 | Anthropic | 95.4 |
| 6 | Gemini 2 Pro | 94.4 | |
| 7 | GPT-4o mini | OpenAI | 93.2 |
| 8 | Llama 3 70B | Meta | 93 |
Comments
Sign in to leave a comment.