📅 May 31, 2026 ⏱️ 9 min read 🏷️ Pricing Report

AI API Pricing August 2026: Complete Guide to All 32 Models

32 models. 10 providers. Prices from $0.075/M to $180/M. The post-deprecation market has stabilized. Here's where every model stands.

August 2026 marks the second full month after Anthropic retired Claude 4 Opus and Claude Sonnet 4. The 32-model market has settled into a stable rhythm. Budget tiers continue to compress, the mid-tier is more competitive than ever, and for the first time since the deprecation dust settled, we can see clear pricing trends emerging.

This guide covers every AI API model's current pricing, the stable post-deprecation landscape, Q3 pricing trends, and where the best deals are right now.

32 Models Available
10 Providers
$0.075 Cheapest / 1M tokens
0 Active Deprecations

📊 Market Stability: Post-Deprecation Settled

Two months after Claude 4 Opus and Claude Sonnet 4 were retired, the market has stabilized. Claude Opus 4.7/4.8 ($5/$25) and Claude Sonnet 4.6 ($3/$15) are firmly established as the replacements. No major deprecations are expected through Q3 2026. If you haven't migrated from the retired models yet, your code is likely broken — switch now.

Complete Pricing: All 32 Models

Every major AI API model ranked by input price. All prices per 1 million tokens.

Budget Tier — Under $0.60/1M Input

Rock-bottom prices for everyday tasks. Chatbots, classifiers, content tools — start here.

Model Provider Input / 1M Output / 1M Context
Gemini 2.0 Flash Lite Google $0.075 $0.30 1M
GPT-oss 20B OpenAI $0.08 $0.35 128K
Llama 3.1 8B Meta (Together.ai) $0.10 $0.10 128K
Gemini 2.0 Flash Google $0.10 $0.40 1M
Llama 4 Scout Meta (Together.ai) $0.11 $0.34 10M
DeepSeek V4 Flash DeepSeek $0.14 $0.28 1M
GPT-4o mini OpenAI $0.15 $0.60 128K
GPT-oss 120B OpenAI $0.15 $0.60 128K
Mistral Small 4 Mistral $0.15 $0.60 128K
Llama 4 Maverick Meta (Together.ai) $0.20 $0.60 10M
GPT-5 mini OpenAI $0.25 $2.00 272K
DeepSeek V3 DeepSeek $0.27 $1.10 128K
DeepSeek V4 Pro DeepSeek $0.44 $0.87 1M
Mistral Large 3 Mistral $0.50 $1.50 128K
Command R Cohere $0.50 $1.50 128K
Grok Build 0.1 xAI $0.30 $0.50 256K

Mid Tier — $0.50–$3.00/1M Input

The sweet spot for production workloads. Strong reasoning at reasonable prices.

Model Provider Input / 1M Output / 1M Context
Llama 3.1 70B Meta (Together.ai) $0.88 $0.88 128K
Kimi K2.6 Moonshot $0.90 $3.75 256K
Claude Haiku 4.5 Anthropic $1.00 $5.00 200K
Gemini 2.5 Pro Google $1.25 $10.00 1M
GPT-5 OpenAI $1.25 $10.00 272K
GPT-5.3 Codex OpenAI $1.75 $14.00 400K
Gemini 3.1 Pro Google $2.00 $12.00 1M
Jamba 1.5 Large AI21 $2.00 $8.00 256K
GPT-4o OpenAI $2.50 $10.00 128K
Command R+ Cohere $2.50 $10.00 128K
Claude Sonnet 4.6 Anthropic $3.00 $15.00 1M
Grok 4.3 xAI $1.25 $2.50 1M

Premium Tier — $5.00+/1M Input

For complex reasoning, code generation, and high-stakes tasks where quality is non-negotiable.

Model Provider Input / 1M Output / 1M Context
Claude Opus 4.8 Anthropic $5.00 $25.00 1M
Claude Opus 4.7 Anthropic $5.00 $25.00 1M
GPT-5.5 OpenAI $5.00 $30.00 1M
GPT-5.5 Pro OpenAI $30.00 $180.00 1M

Q3 2026 Pricing Trends

1. The Market Has Stabilized

After the June 15 deprecation of Claude 4 Opus and Claude Sonnet 4, August sees a calm market. No major model retirements are expected through Q3. The 32-model landscape is the new normal — at least until Q4 when OpenAI and Google typically announce new generations.

2. Budget Tier Compression Continues

The sub-$0.15/M tier is now crowded. Gemini 2.0 Flash Lite ($0.075/M), GPT-oss 20B ($0.08/M), Llama 3.1 8B ($0.10/M), and Gemini 2.0 Flash ($0.10/M) all compete in the same space. The 133,000-tokens-per-penny mark set in July still holds — no provider has broken below it yet.

3. Mid-Tier Is the New Premium

For most production workloads, the $1–3/M tier delivers everything you need. Claude Haiku 4.5 ($1/M) handles 90% of tasks that used to require premium models. Claude Sonnet 4.6 ($3/M) with 1M context is the best code generation model at any price. Going premium ($5+/M) is only justified for the most complex reasoning tasks.

4. Open Source Holds Steady

Meta's Llama 4 Scout ($0.11/M, 10M context) remains the only model offering 10M token context at any price. No new Llama releases are expected until late Q3 or Q4. Together.ai's managed hosting continues to make open-source models zero-ops for teams without infrastructure.

5. xAI Pricing Remains an Outlier

Grok 4.3 at $1.25/$2.50 per 1M tokens is now competitive after xAI's rebrand and repricing. Grok Build 0.1 ($0.30/$0.50) joins the budget tier. The old Grok 3 ($30/$150) has been retired.

Best Deals by Use Case

Use Case Best Model Why
High-volume chatbot Gemini 2.0 Flash Lite Cheapest at $0.075/M, handles most chat tasks
Quality chatbot Claude Haiku 4.5 $1/M with Anthropic's quality
Code generation Claude Sonnet 4.6 Best code quality at $3/M, 1M context
Document analysis Gemini 2.5 Pro 1M context at $1.25/M
Classification / extraction GPT-4o mini $0.15/M, fast, reliable structured output
RAG / retrieval DeepSeek V4 Flash $0.14/M with 1M context
Content writing GPT-5 mini $0.25/M, strong writing at budget price
Complex reasoning Claude Opus 4.7 Best reasoning quality at $5/M
Agent / multi-step GPT-5 $1.25/M, strong tool use, 272K context
Long documents (10M+) Llama 4 Scout Only model with 10M context at $0.11/M
Budget all-around DeepSeek V4 Pro $0.44/M with 1M context — best value pick

What $100/Month Gets You in August 2026

Assuming 1,000 tokens per request with a 50/50 input/output split:

Tier Model Requests for $100 Daily Average
Budget Gemini 2.0 Flash Lite ~571,000 ~19,000/day
Budget DeepSeek V4 Flash ~476,000 ~15,900/day
Budget Llama 4 Scout ~434,000 ~14,500/day
Mid Claude Haiku 4.5 ~62,500 ~2,100/day
Mid GPT-5 ~30,800 ~1,030/day
Mid Claude Sonnet 4.6 ~22,200 ~740/day
Premium Claude Opus 4.7 ~8,000 ~267/day
Premium GPT-5.5 ~7,700 ~257/day

The range: 19,000 requests/day to 257 requests/day for the same $100 budget. Model selection is the single biggest cost lever you have.

Provider Comparison at a Glance

Provider Models Cheapest Most Expensive Best For
OpenAI 9 $0.08/M $180/M Widest range, agents
Anthropic 4 $1.00/M $25/M Code, reasoning
Google 4 $0.075/M $12/M Budget, long context
DeepSeek 3 $0.14/M $1.10/M Budget all-around
Meta (Together.ai) 4 $0.10/M $0.88/M Open source, 10M context
Mistral 2 $0.15/M $1.50/M European compliance
Cohere 2 $0.50/M $10/M RAG, enterprise search
Moonshot 1 $0.90/M $3.75/M Long context (256K)
xAI 2 $3.00/M $150/M Real-time data
AI21 1 $2.00/M $8/M Long context (256K)

What to Watch in Q4 2026

💡 Cost Optimization Tip

If you're still using GPT-4o ($2.50/M) for general tasks, switching to Claude Haiku 4.5 ($1/M) saves 60% with comparable quality. For classification/extraction, GPT-4o mini ($0.15/M) is 17x cheaper than GPT-4o. Use our Cost Optimizer to find savings in your current setup.

Methodology

All pricing data comes from official provider pricing pages, verified as of May 29, 2026. We track 32 active models across 10 providers: OpenAI, Anthropic, Google, DeepSeek, Mistral, Cohere, Meta (via Together.ai), Moonshot, xAI, and AI21.

Prices are per 1 million tokens unless otherwise noted. Context window sizes reflect the maximum supported. Some providers offer batch pricing or committed-use discounts not reflected here.

Calculate your exact costs

Use our free tools to see what these prices mean for your workload. No signup required.

Open Cost Calculator →

Related Tools

← July 2026 Pricing Guide State of LLM Pricing June 2026 →