Budget May 10, 2026 6 min read

OpenAI GPT-oss Pricing: Open-Source Models at $0.08/1M Tokens

OpenAI enters the open-source API market with two models priced to compete with Llama and DeepSeek. Here's what they cost and when to use them.

Pricing at a Glance

GPT-oss 120B

$0.15 / $0.60

Input / Output per 1M tokens

128K context window

GPT-oss 20B

$0.08 / $0.35

Input / Output per 1M tokens

128K context window

OpenAI's GPT-oss models are a departure from the company's typical closed-source approach. These are open-weight models available through OpenAI's API at budget-tier pricing — designed to compete directly with Meta's Llama and DeepSeek's V4 lineup.

The 120B model is priced identically to GPT-4o mini ($0.15/$0.60), while the 20B model undercuts almost everything on the market at $0.08/$0.35.

How GPT-oss Compares to Competitors

Model	Input (per 1M)	Output (per 1M)	Context	Type
GPT-oss 20B	$0.08	$0.35	128K	Open-weight
GPT-oss 120B	$0.15	$0.60	128K	Open-weight
Llama 3.1 8B (Together.ai)	$0.10	$0.10	128K	Open-weight
Llama 3.1 70B (Together.ai)	$0.88	$0.88	128K	Open-weight
DeepSeek V4 Flash	$0.14	$0.28	1M	Closed
DeepSeek V4 Pro	$0.44	$0.87	1M	Closed
GPT-4o mini	$0.15	$0.60	128K	Closed
Mistral Small 4	$0.15	$0.60	128K	Closed

Key Takeaway

GPT-oss 120B is priced identically to GPT-4o mini and Mistral Small 4. The 20B model is the cheapest OpenAI model available, undercutting even Llama 3.1 8B on input pricing. However, Llama 3.1 8B has cheaper output tokens ($0.10 vs $0.35), which matters for generation-heavy workloads.

Monthly Cost Scenarios

Here's what you'd pay for common usage patterns:

Scenario	GPT-oss 120B	GPT-oss 20B	GPT-4o mini	DeepSeek V4 Flash
100K req/day, 2K in / 500 out	$135/mo	$68/mo	$135/mo	$95/mo
1M req/day, 1K in / 200 out	$450/mo	$240/mo	$450/mo	$360/mo
10M req/day, 500 in / 100 out	$2,250/mo	$1,200/mo	$2,250/mo	$1,800/mo

At high volume, GPT-oss 20B saves $1,050/mo over GPT-4o mini for the same workload. That's real money for startups burning through API budgets.

When to Use GPT-oss

High-volume, low-complexity tasks: Classification, routing, simple Q&A, content moderation
Batch processing: When you need to process millions of documents cheaply
Prototyping: Test ideas without burning through expensive API credits
Self-hosting option: As open-weight models, you can also self-host for even lower costs at scale

When to Avoid GPT-oss

Complex reasoning: The 20B model especially may underperform on multi-step logic
Long context: 128K context is adequate but not competitive with Gemini's 1M or DeepSeek V4's 1M
Code generation: GPT-5.3 Codex or Claude Sonnet 4 will produce better code
Critical applications: For production systems where quality matters more than cost, stick with proven models

Calculate your exact savings: Compare GPT-oss against your current model to see how much you'd save.

Try the APIpulse Calculator

The Bigger Picture

OpenAI entering the open-weight market signals that the budget LLM tier is getting crowded. With GPT-oss, Llama 4, DeepSeek V4, Mistral Small, and Gemini Flash all competing under $0.20/1M input tokens, developers have more affordable options than ever.

The real winner isn't any single model — it's the downward pressure on pricing across the board. Use APIpulse to find the cheapest option for your specific workload.