Claude 4 family: Opus, Sonnet, Haiku — which model fits your use case
Anthropic's three-tier release strategy explained, with practical guidance on matching Claude models to production workloads.
Anthropic ships Claude models in three tiers that map roughly to capability, cost, and latency trade-offs. The Claude 4 family — Opus, Sonnet, and Haiku — follows this pattern more deliberately than previous generations.
Claude Opus 4
The flagship. At $15 input / $75 output per 1M tokens it's expensive, but it leads on SWE-bench (52%), GPQA (74%), and extended agentic tasks. The 200k context window makes it a strong choice for repository-level code review and long-document analysis where accuracy matters more than throughput cost.
Claude Sonnet 4
The workhorse at $3 / $15 per 1M tokens. It sits at roughly 80–85% of Opus capability on most benchmarks at about 20% of the price. For most production coding assistants, customer support bots, and RAG pipelines, Sonnet is the correct default.
Claude Haiku 4.5
Released in October 2025 at $0.80 / $4 per 1M tokens. Latency is the primary draw — it's fast enough for real-time streaming UI and interactive agents where sub-second turn latency matters. Capability is noticeably lower than Sonnet on complex tasks.
The routing pattern
The strongest production setups we've seen use Haiku for intent classification and simple Q&A, Sonnet for the main task execution layer, and Opus for difficult reasoning steps that Sonnet fails on. Prompt routing via Haiku can save 60–80% of overall token cost at scale.