Pricing Report

State of LLM API Pricing — June 2026: 34 Models Compared

June 1, 2026 12 min read Last updated: May 29, 2026

The LLM API market in June 2026 is more diverse — and more confusing — than ever. With 34 models across 10 providers, choosing the right API for your project can save you thousands of dollars per month or cost you 100x more than necessary.

This report breaks down every major LLM API's pricing, ranks them by cost, and gives you actionable insights to optimize your spending. All data is verified as of May 29, 2026.

→ View the full interactive report with charts and calculators

The Quick Answer: Cheapest Models by Tier

Tier Cheapest Model Input $/M Output $/M
Budget Gemini 2.0 Flash Lite $0.075 $0.30
Mid-Tier Llama 3.1 70B (Together.ai) $0.88 $0.88
Premium DeepSeek V4 Pro $0.44 $0.87

Yes, you read that right. DeepSeek V4 Pro — a premium-tier model — is cheaper than most mid-tier models from OpenAI and Anthropic. The pricing landscape has shifted dramatically in 2026.

Price Spread: 67x Difference Between Cheapest and Most Expensive

The gap between the cheapest and most expensive models has widened significantly:

This means your choice of model can be the difference between a $50/month bill and a $3,000/month bill for the same workload.

Provider-by-Provider Breakdown

OpenAI (9 models)

OpenAI has the widest model range, from the ultra-cheap GPT-oss models ($0.08/$0.35) to the premium GPT-5.5 Pro ($30/$180). The sweet spot is GPT-5 mini at $0.25/$2.00 — 5x cheaper than GPT-5 with most of the capability.

Anthropic (6 models)

Anthropic's pricing is more uniform. Claude Haiku 4.5 ($1/$5) is the best value, while Claude Opus 4.8 ($5/$25) is the premium option. Note: Claude 4 Opus and Claude Sonnet 4 are retiring June 15, 2026 — migrate now.

Google (4 models)

Google dominates the budget tier. Gemini 2.0 Flash Lite ($0.075/$0.30) is the absolute cheapest model available. Gemini 2.0 Flash ($0.10/$0.40) is close behind. Even Gemini 3.1 Pro ($2/$12) is competitive with mid-tier offerings.

DeepSeek (3 models)

DeepSeek is the value champion. DeepSeek V4 Pro ($0.44/$0.87) offers near-premium quality at budget prices. DeepSeek V4 Flash ($0.14/$0.28) is even cheaper. Note: DeepSeek V3 is retiring June 15, 2026.

Mistral (2 models)

Mistral offers straightforward budget pricing. Mistral Large 3 ($0.50/$1.50) and Mistral Small 4 ($0.15/$0.60) are both solid choices for cost-conscious developers.

Cohere (2 models)

Cohere's Command R ($0.50/$1.50) is competitive for RAG workloads. Command R+ ($2.50/$10) targets enterprise use cases.

Meta via Together.ai (4 models)

Open-source models via Together.ai. Llama 3.1 8B ($0.10/$0.10) is the cheapest model with equal input/output pricing. Llama 4 Scout and Maverick offer larger context windows (10M tokens) but are only available on dedicated inference.

Others (Moonshot, xAI, AI21)

Moonshot's Kimi K2.6 ($0.95/$4.00) targets Chinese-language workloads. xAI's Grok 4.3 ($1.25/$2.50) is a budget-friendly option; Grok Build 0.1 ($0.30/$0.50) joins the budget tier. AI21's Jamba 1.5 Large ($2/$8) is a solid mid-tier option.

Cost Scenarios: What You'll Actually Pay

Scenario: Moderate Usage (1M input + 500K output tokens/day)

This is typical for a production chatbot or RAG pipeline serving ~1,000 users/day.

  • Cheapest: Gemini Flash Lite — $6.75/mo
  • Best value: DeepSeek V4 Flash — $16.80/mo
  • Mid-tier: Claude Haiku 4.5 — $105/mo
  • Premium: GPT-5 — $600/mo
  • Most expensive: GPT-5.5 Pro — $36,000/mo

The difference between the cheapest and most expensive option for the same workload is 5,333x. Even between "reasonable" choices, you can easily save 50-90% by picking the right model.

5 Key Insights for June 2026

  1. Budget models are production-ready. Gemini Flash and DeepSeek V4 Flash aren't just cheap — they're good enough for most production workloads. The quality gap between budget and premium has narrowed significantly.
  2. Output tokens are the real cost. Across all providers, output tokens cost 3-6x more than input. Optimize your prompts to minimize output length. Use structured outputs (JSON) instead of free-form text where possible.
  3. A deprecation wave is coming. Claude 4 Opus, Claude Sonnet 4, and DeepSeek V3 all retire on June 15, 2026. If you're using these models, migrate now to avoid service disruption.
  4. Context windows are converging at 1M. Most mid-tier and premium models now offer 1M+ token context windows. This makes RAG and long-document processing more cost-effective than ever.
  5. Mini/Flash variants are the sweet spot. GPT-5 mini, Gemini Flash, and Claude Haiku offer 80% of the capability at 20% of the price. For most production workloads, these are the optimal choice.

Deprecation Alert: Migrate Before June 15

⚠ Three models retiring June 15, 2026

  • Claude 4 Opus ($15/$75) → Claude Opus 4.8 ($5/$25) — actually cheaper!
  • Claude Sonnet 4 ($3/$15) → Claude Sonnet 4.6 ($3/$15) — same price, better model
  • DeepSeek V3 ($0.27/$1.10) → DeepSeek V4 Flash ($0.14/$0.28) — 50% cheaper!

All three replacements are equal or cheaper than the retiring models. There's no reason to wait.

How to Use This Data

We've built interactive tools to help you apply this data to your specific use case:

Calculate Your Exact Costs

Use our free calculator to estimate your monthly spend across all 34 models, or upgrade to Pro for saved scenarios and optimization recommendations.

Free Calculator See Pro Features

Data verified May 29, 2026. Prices in USD per million tokens. This report is updated monthly. View the full interactive version with sortable tables, charts, and cost scenarios.