Best vision AI models

AI models that understand images: chart reading, OCR, UI understanding, and visual reasoning.

All models below accept image input. Ranked by overall intelligence and multimodal benchmark performance.

1
o3
83.5Frontier
OpenAI
Context:
200K
Input:
$2/1M
Output:
$8/1M
VisionMathAgenticLong contextFrontierReasoning
2
Gemini 2 Pro
82.8Frontier
Google
Context:
2M
Input:
$1.25/1M
Output:
$5/1M
VisionMathAgenticLong contextFrontierCode
3
Grok 3
80.1Frontier
xAI
Context:
131K
Input:
$3/1M
Output:
$15/1M
VisionMathAgenticFrontierReasoning
4
GPT-5
77.9Strong
OpenAI
Context:
272K
Input:
$1.25/1M
Output:
$10/1M
VisionMathAgenticLong contextReasoningCode
5
GPT-4o mini
77.1Strong
OpenAI
Context:
128K
Input:
$0.15/1M
Output:
$0.6/1M
VisionMathAgenticBudget
6
o1
76.3Strong
OpenAI
Context:
200K
Input:
$15/1M
Output:
$60/1M
VisionMathAgenticLong contextReasoning
7
Claude Opus 4
75.7Strong
Anthropic
Context:
200K
Input:
$5/1M
Output:
$25/1M
VisionAgenticLong contextReasoningCode
8
Claude Sonnet 4
66.2Competent
Anthropic
Context:
200K
Input:
$3/1M
Output:
$15/1M
VisionAgenticLong context
9
GPT-5.4
59.3Competent
OpenAI
Context:
1.1M
VisionAgenticLong context
10
Claude Opus 4.7
57.2Competent
Anthropic
Context:
1M
VisionAgenticLong context
11
Qwen3.5-27B
53.6Basic
Alibaba
Context:
262K
Input:
$0.195/1M
Output:
$1.56/1M
VisionLong context
12
Gemma 4 31B
45.1Basic
Google
Context:
262K
Input:
$0.13/1M
Output:
$0.38/1M
VisionLong contextBudget
13
Claude Sonnet 4.6
44.8Basic
Anthropic
Context:
1M
VisionAgenticLong contextCode
14
GPT-5.4 nano
43.3Basic
OpenAI
Context:
272K
VisionAgenticLong context
15
GPT-5.4 mini
34.6Limited
OpenAI
Context:
272K
VisionAgenticLong context
16
Claude Haiku 4.5
31.1Limited
Anthropic
Context:
200K
Input:
$1/1M
Output:
$5/1M
VisionAgenticLong context