Best vision AI models
AI models that understand images: chart reading, OCR, UI understanding, and visual reasoning.
All models below accept image input. Ranked by overall intelligence and multimodal benchmark performance.
- 1
o3
83.5FrontierOpenAI- Context:
- 200K
- Input:
- $2/1M
- Output:
- $8/1M
VisionMathAgenticLong contextFrontierReasoning - 2
Gemini 2 Pro
82.8FrontierGoogle- Context:
- 2M
- Input:
- $1.25/1M
- Output:
- $5/1M
VisionMathAgenticLong contextFrontierCode - 3
Grok 3
80.1FrontierxAI- Context:
- 131K
- Input:
- $3/1M
- Output:
- $15/1M
VisionMathAgenticFrontierReasoning - 4
GPT-5
77.9StrongOpenAI- Context:
- 272K
- Input:
- $1.25/1M
- Output:
- $10/1M
VisionMathAgenticLong contextReasoningCode - 5
GPT-4o mini
77.1StrongOpenAI- Context:
- 128K
- Input:
- $0.15/1M
- Output:
- $0.6/1M
VisionMathAgenticBudget - 6
o1
76.3StrongOpenAI- Context:
- 200K
- Input:
- $15/1M
- Output:
- $60/1M
VisionMathAgenticLong contextReasoning - 7
Claude Opus 4
75.7StrongAnthropic- Context:
- 200K
- Input:
- $5/1M
- Output:
- $25/1M
VisionAgenticLong contextReasoningCode - 8
Claude Sonnet 4
66.2CompetentAnthropic- Context:
- 200K
- Input:
- $3/1M
- Output:
- $15/1M
VisionAgenticLong context - 9
GPT-5.4
59.3CompetentOpenAI- Context:
- 1.1M
VisionAgenticLong context - 10
Claude Opus 4.7
57.2CompetentAnthropic- Context:
- 1M
VisionAgenticLong context - 11
Qwen3.5-27B
53.6BasicAlibaba- Context:
- 262K
- Input:
- $0.195/1M
- Output:
- $1.56/1M
VisionLong context - 12
Gemma 4 31B
45.1BasicGoogle- Context:
- 262K
- Input:
- $0.13/1M
- Output:
- $0.38/1M
VisionLong contextBudget - 13
Claude Sonnet 4.6
44.8BasicAnthropic- Context:
- 1M
VisionAgenticLong contextCode - 14
GPT-5.4 nano
43.3BasicOpenAI- Context:
- 272K
VisionAgenticLong context - 15
GPT-5.4 mini
34.6LimitedOpenAI- Context:
- 272K
VisionAgenticLong context - 16
Claude Haiku 4.5
31.1LimitedAnthropic- Context:
- 200K
- Input:
- $1/1M
- Output:
- $5/1M
VisionAgenticLong context