Aider Polyglot

Coding% pass@2

Real-world coding edits across 6 programming languages — measures whether the model produces a correct edit accepted on second attempt.

At a glance

🏆 Top score
GPT-5OpenAI88 % pass@2
Total results
13
Models tested
13
Providers
7
Verified · Self-reported
13 · 0
Average
48.95 % pass@2
Median
53.8 % pass@2
Range
3.6 – 88 % pass@2
Latest result
Apr 18, 2026

Score distribution

2
1
0
0
1
3
3
0
2
1
3.645.888.0
13 results across 10 score bands

Methodology

225 exercism-style coding exercises in C++, Go, Java, JavaScript, Python and Rust. Score = pass_rate_2 (% of cases where the model's second attempt produces a passing solution).

Limitations

Aider's harness shapes prompts in a specific way — results may not directly compare to other coding benchmarks.

By provider

Full leaderboard

Showing 13 of 13
#ModelProviderScore (% pass@2)
1GPT-5OpenAI
88
2Gemini 2.5 ProGoogle
72.9
3o4-miniOpenAI
72
4o1OpenAI
61.7
5DeepSeek R1DeepSeek
56.9
6DeepSeek V3DeepSeek
55.1
7o3OpenAI
53.8
8Grok 3xAI
53.3
9GPT-4.1OpenAI
52.4
10Qwen: Qwen3 32BAlibaba
40
11Llama 4 MaverickMeta
15.6
12CodestralMistral AI
11.1
13GPT-4oOpenAI
3.6

Comments

Sign in to leave a comment.