MATH

Math% accuracy

12,500 competition math problems across 7 subjects.

At a glance

🏆 Top score
o3OpenAI97.8 % accuracy
Total results
9
Models tested
9
Providers
5
Verified · Self-reported
0 · 9
Average
87.68 % accuracy
Median
90 % accuracy
Range
70.2 – 97.8 % accuracy
Latest result
Jun 1, 2025

Score distribution

1
0
1
1
0
0
1
1
2
2
70.284.097.8
9 results across 10 score bands

Methodology

Numerical and closed-form answers on problems from AMC/AIME-style sources.

Limitations

Some training contamination on older models. New problems needed over time.

By provider

  • OpenAI· 5 models
    97.8 % accuracy
    o3
    Average: 87.98 % accuracyBest: 97.8 % accuracy
  • DeepSeek· 1 model
    97.3 % accuracy
    DeepSeek R1
    Average: 97.3 % accuracyBest: 97.3 % accuracy
  • xAI· 1 model
    93.3 % accuracy
    Grok 3
    Average: 93.3 % accuracyBest: 93.3 % accuracy
  • Microsoft· 1 model
    80.4 % accuracy
    Phi-4
    Average: 80.4 % accuracyBest: 80.4 % accuracy
  • Anthropic· 1 model
    78.2 % accuracy
    Claude Opus 4
    Average: 78.2 % accuracyBest: 78.2 % accuracy

Full leaderboard

Showing 9 of 9
#ModelProviderScore (% accuracy)
1o3OpenAI
97.8
2DeepSeek R1DeepSeek
97.3
3o1OpenAI
94.8
4Grok 3xAI
93.3
5o3-miniOpenAI
90
6GPT-5OpenAI
87.1
7Phi-4Microsoft
80.4
8Claude Opus 4Anthropic
78.2
9GPT-4o miniOpenAI
70.2

Comments

Sign in to leave a comment.