Blog · Jun 12, 2026
3 New Gemini Models: Flash-Lite, 3 Flash & 2.5 Flash-Lite — Pricing & Comparison
Google dropped 3 new Gemini models this week — all with 1M context windows, all cheaper than you'd expect. Here's what each one costs and when to use it.
Google quietly released three new Gemini API models in June 2026, bringing the total Gemini lineup to 8 models. The big news: all three have 1M token context windows and aggressive budget pricing. If you're still using Gemini 2.0 Flash (now deprecated), it's time to upgrade.
The 3 New Models at a Glance
| Model | Tier | Input | Output | Context | Replaces |
|---|---|---|---|---|---|
| Gemini 2.5 Flash-Lite | Budget | $0.10 | $0.40 | 1M | Gemini 2.0 Flash-Lite |
| Gemini 3.1 Flash-Lite | Budget | $0.25 | $1.50 | 1M | — |
| Gemini 3 Flash | Budget | $0.50 | $3.00 | 1M | Gemini 2.0 Flash |
All prices per 1M tokens. Verified Jun 12, 2026. Gemini 2.0 Flash and Gemini 2.0 Flash Lite are now deprecated by Google.
1. Gemini 2.5 Flash-Lite — The Cheapest Option
At $0.10/$0.40 per 1M tokens, this is the cheapest Gemini model ever — and one of the cheapest LLM APIs on the market. It matches the deprecated Gemini 2.0 Flash pricing but adds a massive 1M token context window (up from 1M on 2.0 Flash, same size but newer architecture).
Best for: High-volume classification, content moderation, chatbot backends, embeddings preprocessing — anything where you're processing millions of tokens and cost is the primary concern.
Trade-off: As a "Lite" model, it may not handle complex reasoning as well as Gemini 3 Flash or Gemini 2.5 Pro. Use it for simple, high-volume tasks.
2. Gemini 3.1 Flash-Lite — The Production Budget Pick
At $0.25/$1.50 per 1M tokens, Gemini 3.1 Flash-Lite sits in a sweet spot: 2.5x more expensive than 2.5 Flash-Lite on input, but with better quality output (1.50 vs 0.40 — that's 3.75x more, reflecting better generation quality). The 1M context window handles long documents and codebases with ease.
Best for: Production workloads that need good quality at low cost — chatbots, content generation, summarization, RAG pipelines. The go-to replacement for deprecated Gemini 2.0 Flash Lite.
Trade-off: Output at $1.50/M is 3.75x the input cost. For output-heavy workloads, Gemini 2.5 Flash-Lite ($0.40 output) is cheaper — but lower quality.
3. Gemini 3 Flash — The Speed Demon
At $0.50/$3.00 per 1M tokens, Gemini 3 Flash is the replacement for deprecated Gemini 2.0 Flash. It's 5x more expensive on input ($0.50 vs $0.10) but brings the Gemini 3 architecture: faster inference, better multimodal support, and the full 1M context window.
Best for: Applications that need speed + quality at a reasonable price. Great for real-time chat, code generation, multimodal tasks (image + text), and Google Cloud workloads.
Trade-off: 5x the cost of Gemini 2.5 Flash-Lite. If you don't need the Gemini 3 architecture improvements, the cheaper models are fine.
Monthly Cost at 1,000 Requests/Day
Assuming 1,000 input tokens and 500 output tokens per request:
2.5 Flash-Lite
$4.00
3.1 Flash-Lite
$22.50
3 Flash
$45.00
Claude Haiku 4.5
$35.00
GPT-4o mini
$9.00
How These Compare to the Competition
Here's how the new Gemini budget models stack up against other cheap options:
- Gemini 2.5 Flash-Lite vs DeepSeek V4 Flash: Gemini is cheaper on input ($0.10 vs $0.14) but more expensive on output ($0.40 vs $0.28). DeepSeek wins on output-heavy tasks.
- Gemini 3.1 Flash-Lite vs Claude Haiku 4.5: Gemini is 75% cheaper on input ($0.25 vs $1.00) and 70% cheaper on output ($1.50 vs $5.00). Haiku has better reasoning, but for most tasks Gemini is the better value.
- Gemini 3 Flash vs GPT-4o mini: GPT-4o mini is cheaper at $0.15/$0.60 vs Gemini 3 Flash at $0.50/$3.00. But Gemini 3 Flash has 1M context vs GPT-4o mini's 128K — 8x more context for 3x the price.
- All 3 vs Mistral Small 4: Mistral Small 4 at $0.10/$0.30 is cheaper on input than Gemini 3.1 Flash-Lite ($0.25/$1.50) but has only 128K context vs 1M. For long documents, the new Gemini models win.
Which One Should You Use?
Absolute cheapest
Gemini 2.5 Flash-Lite — $0.10/$0.40, can't beat it
Best quality per dollar
Gemini 3.1 Flash-Lite — $0.25/$1.50 with 1M context
Speed + quality balance
Gemini 3 Flash — $0.50/$3.00, fastest inference
Need 1M context on a budget
Any of these — all have 1M windows
The Bigger Picture
These 3 new models push the APIpulse database to 42 models across 10 providers. The Gemini lineup now covers every price point from $0.10 to $12.50 per 1M input tokens — the widest range of any single provider.
The trend is clear: 1M context windows are becoming the standard, even at budget price points. If you're still using models with 128K context, you're paying for less capability than you could get.
With Claude 4 shutting down June 15, now is the perfect time to evaluate these new Gemini options. Use our deprecation calculator to see how much you'd save by switching.
Compare all 42 models side by side
Open Cost CalculatorPricing data verified Jun 12, 2026. Prices per 1M tokens. See our full pricing table for all 42 models.