๐Ÿ”ฅ Limited time: Pro lifetime access $29 โ€” price goes up July 12 โ†’

State of LLM Pricing: Q2 2026

48 models. 10 providers. The definitive quarterly report on what's cheap, what's expensive, and what's changed since last quarter.

๐Ÿšจ Claude 4 retired June 15: See all 48 alternatives, calculate your savings, and get migration code on our Claude 4 Migration Hub.

The LLM API market in Q2 2026 looks nothing like it did six months ago. Prices have cratered for some models and skyrocketed for others. Budget-tier models now match 2024 flagship quality. And the smartest teams have stopped picking one model โ€” they're routing dynamically based on task complexity.

We track every price across every provider. Here's the full picture.

Updated Jun 1, 2026: Since this report was published, xAI rebranded Grok 3 โ†’ Grok 4.3 ($1.25/$2.50, down from $30/$150) and Grok 3 Mini โ†’ Grok Build 0.1 ($0.30/$0.50). The analysis below reflects May 2026 pricing. See xAI pricing for current data.

Q2 2026 at a Glance

-75%
Biggest price drop
(Mistral Large, DeepSeek V4 Pro)
+10x
Biggest price hike
(Grok 3)
$0.075
Cheapest input
(Gemini Flash Lite)
34
Models tracked
across 10 providers

The Biggest Price Moves

If you haven't re-evaluated your AI provider in the last quarter, you're almost certainly overpaying. Here are the moves that matter:

The Drops

Model Old Price (input) New Price (input) Change
Mistral Large 3 $2.00/1M $0.50/1M -75%
DeepSeek V4 Pro $1.75/1M $0.44/1M -75%
GPT-4o $10.00/1M $2.50/1M -67%
Claude Opus 4.7 $15.00/1M $5.00/1M -67%

The Hike

Model Old Price (input) New Price (input) Change
Grok 3 $3.00/1M $30.00/1M +10x

Grok 3's 10x price increase is the largest single-model price hike we've ever tracked. At $30/$150 per 1M tokens, it's now the most expensive model in our database โ€” pricier than GPT-5.5 and Claude Opus 4.7 combined. If you were using Grok 3 for production workloads, it's time to switch.

Full Pricing Matrix: Every Model, Every Provider

Here's the complete current pricing as of May 2026, organized by tier.

Premium Tier

Maximum capability. Best for complex reasoning, multimodal tasks, and high-stakes outputs.

Model Provider Input Output Context
GPT-5.5 Pro OpenAI $30.00 $180.00 1M
Grok 3 xAI $30.00 $150.00 128K
Claude 4 Opus Anthropic $15.00 $75.00 200K
GPT-5.5 OpenAI $5.00 $30.00 1M
Claude Opus 4.7 Anthropic $5.00 $25.00 1M
Mid Tier

Strong performance at reasonable cost. The sweet spot for most production workloads.

Model Provider Input Output Context
Claude Sonnet 4.6 Anthropic $3.00 $15.00 1M
GPT-4o OpenAI $2.50 $10.00 128K
Command R+ Cohere $2.50 $10.00 128K
Gemini 3.1 Pro Google $2.00 $12.00 1M
Jamba 1.5 Large AI21 $2.00 $8.00 256K
GPT-5.3 Codex OpenAI $1.75 $14.00 400K
GPT-5 OpenAI $1.25 $10.00 272K
Gemini 2.5 Pro Google $1.25 $10.00 1M
Claude Haiku 4.5 Anthropic $1.00 $5.00 200K
Llama 3.1 70B Meta (Together.ai) $0.88 $0.88 128K
Budget Tier

Production-viable quality at rock-bottom prices. The biggest story of Q2 2026.

Model Provider Input Output Context
Kimi K2.6 Moonshot $0.90 $3.75 256K
Mistral Large 3 Mistral $0.50 $1.50 128K
DeepSeek V4 Pro DeepSeek $0.44 $0.87 1M
Command R Cohere $0.50 $1.50 128K
Gemini 2.5 Flash-Lite Google $0.10 $0.40 1M
DeepSeek V4 Flash DeepSeek $0.14 $0.28 1M
GPT-5 mini OpenAI $0.25 $2.00 272K
GPT-4o mini OpenAI $0.15 $0.60 128K
GPT-oss 120B OpenAI $0.15 $0.60 128K
Mistral Small 4 Mistral $0.15 $0.60 128K
Llama 4 Scout Meta (Together.ai) $0.18 $0.34 10M
Llama 4 Maverick Meta (Together.ai) $0.20 $0.60 10M
Llama 3.1 8B Meta (Together.ai) $0.10 $0.10 128K
GPT-oss 20B OpenAI $0.08 $0.35 128K
Gemini 2.5 Flash-Lite Google $0.075 $0.30 1M
400x

Price difference between the most expensive model (GPT-5.5 Pro at $30/$180) and the cheapest (Gemini Flash Lite at $0.075/$0.30)

Five Trends Defining Q2 2026

1. Budget Models Are Now Production-Viable

This is the biggest story of the quarter. Models like Gemini 2.5 Flash-Lite ($0.075/1M), Llama 3.1 8B ($0.10/1M), and DeepSeek V4 Flash ($0.14/1M) deliver quality that matches or exceeds 2024's GPT-4 โ€” at 1/20th the price. For classification, summarization, and simple chat, there's no reason to pay premium prices anymore.

2. Context Windows Exploded

1M token context windows are now standard across premium and mid-tier models. Budget models are catching up: Llama 4 Scout offers 1M tokens of context via Together.ai. DeepSeek V4 Pro and V4 Flash both support 1M. If you're still chunking documents to fit 128K windows, you're leaving money on the table.

3. Price Volatility Is the New Normal

Grok 3 went up 10x. DeepSeek V4 Pro went down 75%. GPT-4o dropped 67%. These aren't incremental adjustments โ€” they're seismic shifts. The era of stable, predictable AI pricing is over. If you're not checking prices quarterly, you're flying blind.

4. Multi-Model Routing Is the Optimal Strategy

The best teams in 2026 don't pick one model. They route dynamically:

A blended cost of under $2/1M tokens is achievable for most workloads.

5. Batch APIs Changed the Math

OpenAI's Batch API offers a 50% discount. Anthropic and Google offer similar batch pricing. If your workload isn't time-sensitive โ€” data labeling, content generation, document processing โ€” batch everything. The savings are massive at scale.

Cost Comparison: Real Workloads

Let's see what these prices actually mean for production systems. Here are four common workloads compared across price tiers:

AI Coding Assistant

2K input + 1.5K output tokens, 500 requests/day
Premium (GPT-5.5)$247.50/mo
Mid (Claude Sonnet 4.6)$142.50/mo
Budget (DeepSeek V4 Pro)$7.88/mo

RAG Pipeline

5K input + 800 output tokens, 1K requests/day
Premium (GPT-5.5)$750.00/mo
Mid (Gemini 3.1 Pro)$264.00/mo
Budget (DeepSeek V4 Pro)$21.33/mo

Customer Support Chatbot

1.5K input + 500 output tokens, 2K requests/day
Premium (Claude Opus 4.7)$420.00/mo
Mid (GPT-4o)$195.00/mo
Budget (Gemini Flash)$13.20/mo

Content Generation

1K input + 3K output tokens, 200 requests/day
Premium (GPT-5.5)$570.00/mo
Mid (Claude Sonnet 4.6)$288.00/mo
Budget (DeepSeek V4 Pro)$16.27/mo
$564K

Annual savings switching from GPT-5.5 to DeepSeek V4 Pro at 100M tokens/day

Provider-by-Provider Breakdown

OpenAI

The broadest model lineup (8 models). GPT-4o's 67% price drop made it a mid-tier option. GPT-5.5 remains the reasoning king at $5/$30. The new GPT-oss models (20B and 120B) are OpenAI's first budget-tier open-weight offerings. Best for: Complex reasoning, multimodal tasks, code generation.

Anthropic

Claude Opus 4.7 dropped from $15 to $5 input โ€” a 67% reduction. Claude Sonnet 4.6 at $3/$15 is the best mid-tier value for long-context work (1M tokens). Haiku 4.5 at $1/$5 fills the budget gap. Best for: Long-form writing, analysis, extended context tasks.

Google

GPT-oss 20B at $0.08/$0.35 is the cheapest model in our database. Gemini 3.1 Pro ($2/$12) offers flagship quality at mid-tier pricing. All models support 1M context. Best for: High-volume budget workloads, long-context analysis.

DeepSeek

The price-to-performance champion. V4 Pro at $0.44/$0.87 with 1M context is absurdly cheap for its capability level. V4 Flash at $0.14/$0.28 is even cheaper. Best for: Cost-sensitive production workloads, high-volume processing.

Mistral

Mistral Large 3's 75% price drop (from $2 to $0.50) repositioned it as a budget model. Mistral Small 4 at $0.10/$0.30 competes directly with GPT-4o mini. Best for: European compliance needs, budget workloads.

xAI

Grok 3's 10x price hike ($3 โ†’ $30) makes it the most expensive model in our database. Grok 3 Mini at $3/$5 is more reasonable. Verdict: Hard to recommend at current pricing unless you need Grok-specific capabilities.

Meta (via Together.ai)

Llama 4 Scout (1M context) is excellent for long-document workloads. Llama 3.1 8B ($0.10/$0.10) remains the cheapest model for simple tasks. Best for: Self-hosted flexibility, massive context windows.

Decision Framework

Which Model Should You Use?

  • Tightest budget, simple tasks: Gemini 2.5 Flash-Lite ($0.075/1M) โ€” cheapest option, 1M context
  • Best value for general use: DeepSeek V4 Pro ($0.44/1M) โ€” 91% cheaper than GPT-5.5 with 1M context
  • Best mid-tier quality: Claude Sonnet 4.6 ($3/1M) or GPT-5 ($1.25/1M) โ€” strong reasoning at reasonable cost
  • Maximum capability: GPT-5.5 ($5/1M) or Claude Opus 4.7 ($5/1M) โ€” top-tier for complex tasks
  • Longest context: Llama 4 Scout (1M context) via Together.ai
  • Code-heavy workloads: DeepSeek V4 Pro ($0.44/1M) or GPT-5.3 Codex ($1.75/1M)
  • Batch processing: Any model via Batch API for 50% off โ€” route to cheapest that meets quality needs

What to Watch in Q3 2026

The trend is clear: prices will keep falling for budget and mid-tier models, while premium models hold steady or increase. The gap between "cheapest viable" and "best available" will keep widening.

Methodology

All pricing data is sourced directly from provider documentation and verified against API responses. Prices shown are per 1M tokens unless otherwise noted. Context windows reflect the maximum supported by each model. Data was last verified on Jun 3, 2026.

We track 48 models across 10 providers: OpenAI, Anthropic, Google, DeepSeek, Mistral, Cohere, Meta (via Together.ai), Moonshot, xAI, and AI21. Prices are checked monthly and updated in our pricing changelog.

Calculate your exact costs across all 48 models

Interactive calculators, savings comparisons, and model recommendations โ€” free, no signup.

Try the Calculator โ€” Free

โ€” See if you're overpaying for AI APIs

๐ŸŽฏ API Cost Score

Rate your API setup โ€” get a letter grade in 30 seconds

๐ŸŽฏ Rate Your API Setup in 30 Seconds

Get an A+ to F grade on your AI API costs. See how you compare and find cheaper alternatives instantly.

Get Your Cost Score โ†’

๐Ÿ“Š Generate Your Personalized API Cost Report

Select your model, enter your monthly spend, and get a custom savings report with cheaper alternatives โ€” free, in 60 seconds.

Related Articles

Save money: ๐Ÿ“Š Live API Pricing ยท Cost Optimizer โ€” find out how much you could save by switching models. Free tool.

Want to optimize your AI API costs?

APIpulse Pro ($29 one-time) includes saved scenarios, cost report exports, and personalized recommendations that can save you up to 40%.

Get Pro — $29
๐Ÿ’ธ Looking for DeepSeek V4 Flash Alternatives?
5 models ranked by cost โ€” some offer better quality at similar prices.
See 5 DeepSeek V4 Flash Alternatives โ†’
๐Ÿ’ธ Looking for Sonnet 4.6 Alternatives?
5 models ranked by cost โ€” some are 90% cheaper.
See 5 Sonnet 4.6 Alternatives โ†’
๐Ÿ’ธ Looking for Llama 4 Maverick Alternatives?
5 models ranked by cost โ€” some are 95% cheaper.
See 5 Llama 4 Maverick Alternatives โ†’
๐Ÿ’ธ Looking for Mistral Small 4 Alternatives?
5 models ranked by cost โ€” some are 90% cheaper.
See 5 Mistral Small 4 Alternatives โ†’
๐Ÿ’ธ Looking for Gemini 3.1 Pro Alternatives?
5 models ranked by cost โ€” some are 95% cheaper.
See 5 Gemini 3.1 Pro Alternatives โ†’
๐Ÿ’ธ Looking for Llama 4 Scout Alternatives?
5 models ranked by cost โ€” some are 95% cheaper.
See 5 Llama 4 Scout Alternatives โ†’
๐Ÿ”ง Free Embeddable Pricing Widget
Add live AI API pricing to your docs, blog, or README with one script tag. 48 models, auto-updating.
Get the Free Widget โ†’ Free MCP Server โ†’