Gemini 2 Pro's 2 million token context: practical applications and limits
Google's Gemini 2 Pro offers the longest context window of any production API. We look at what actually works at that scale and where the model starts to degrade.
A 2 million token context window is roughly equivalent to ten full-length novels, or a medium-sized codebase with documentation. Google's Gemini 2 Pro ships this as a production API feature — not a research preview — at $1.25 input / $5 output per 1M tokens.
What works well
Repository-level questions on large codebases are the clearest win. Loading an entire codebase (including tests and documentation) and asking architectural questions or searching for specific patterns is genuinely faster and more accurate than chunked RAG on codebases under ~500k tokens.
Long-form document analysis — regulatory filings, legal contracts, large technical specifications — also benefits. When the entire document fits in one context, the model can answer cross-referential questions that RAG frequently gets wrong due to retrieval misses.
Where it degrades
Needle-in-a-haystack retrieval accuracy starts to drop past 500k–800k tokens for questions that require precise fact recall from the middle of the document. This has improved substantially since the 1.5 generation but hasn't fully closed.
Context poisoning is a real risk at scale: when irrelevant information crowds out the relevant signal, performance on the actual task suffers. Filtering and summarizing before loading large contexts often outperforms naive dump-everything approaches.
Pricing reality
A 1M-token context request (roughly 750 pages) costs $1.25. For exploratory analysis this is fine. For a production feature called many times per day, the costs scale quickly — a 200k-context call at 1000 requests/day runs ~$250/day just in input tokens.