AnalysisAnalysisSeptember 1, 2025by AiWiki Editorial

AI API pricing in 2025: a year of cuts and what it means for builders

Token prices have fallen 60–90% across major providers since 2023. We look at the trajectory, what's driving it, and how to build pricing assumptions that hold.

In 2023, GPT-4 cost $60 per 1M output tokens. Today, GPT-4o costs $10, and Gemini 2 Flash costs $0.40. The trend is relentless and shows no sign of stopping.

What's driving the cuts

Three factors compound: hardware efficiency (better H100 utilization, memory bandwidth improvements), model architecture improvements (smaller models that match larger predecessors), and competition. When Google cuts Gemini Flash prices, OpenAI responds within weeks.

The emerging tier structure

A clear tier structure has emerged:

Sub-$1/1M input: Gemini Flash, Claude Haiku — high-volume, latency-sensitive

$2–5/1M input: GPT-4o, Gemini Pro, Mistral Large — general production workloads

$10–15/1M input: GPT-5, Claude Opus, o1 — complex reasoning, agentic tasks

What this means for product decisions

The break-even point for AI features has moved dramatically. Tasks that were prohibitively expensive in 2023 are now viable at scale. A 500-token classification call on Gemini Flash costs $0.00025 — 4000 calls for one dollar. At these prices, AI is feasible as a backend for nearly every user action.

Planning for further cuts

If you're building pricing models or unit economics projections, assume 30–50% price cuts per year for the next 2–3 years. Build abstractions that make it easy to swap models — what costs $10/1M today will cost $2–3/1M within 18 months.

#pricing #api #trends #openai #google #anthropic