OpenAI GPT-oss Pricing: Open-Source Models at $0.08/1M Tokens
OpenAI enters the open-source API market with two models priced to compete with Llama and DeepSeek. Here's what they cost and when to use them.
Pricing at a Glance
128K context window
128K context window
OpenAI's GPT-oss models are a departure from the company's typical closed-source approach. These are open-weight models available through OpenAI's API at budget-tier pricing — designed to compete directly with Meta's Llama and DeepSeek's V4 lineup.
The 120B model is priced identically to GPT-4o mini ($0.15/$0.60), while the 20B model undercuts almost everything on the market at $0.08/$0.35.
How GPT-oss Compares to Competitors
| Model | Input (per 1M) | Output (per 1M) | Context | Type |
|---|---|---|---|---|
| GPT-oss 20B | $0.08 | $0.35 | 128K | Open-weight |
| GPT-oss 120B | $0.15 | $0.60 | 128K | Open-weight |
| Llama 3.1 8B (Together.ai) | $0.10 | $0.10 | 128K | Open-weight |
| Llama 3.1 70B (Together.ai) | $0.88 | $0.88 | 128K | Open-weight |
| DeepSeek V4 Flash | $0.14 | $0.28 | 1M | Closed |
| DeepSeek V4 Pro | $0.44 | $0.87 | 1M | Closed |
| GPT-4o mini | $0.15 | $0.60 | 128K | Closed |
| Mistral Small 4 | $0.15 | $0.60 | 128K | Closed |
Key Takeaway
GPT-oss 120B is priced identically to GPT-4o mini and Mistral Small 4. The 20B model is the cheapest OpenAI model available, undercutting even Llama 3.1 8B on input pricing. However, Llama 3.1 8B has cheaper output tokens ($0.10 vs $0.35), which matters for generation-heavy workloads.
Monthly Cost Scenarios
Here's what you'd pay for common usage patterns:
| Scenario | GPT-oss 120B | GPT-oss 20B | GPT-4o mini | DeepSeek V4 Flash |
|---|---|---|---|---|
| 100K req/day, 2K in / 500 out | $135/mo | $68/mo | $135/mo | $95/mo |
| 1M req/day, 1K in / 200 out | $450/mo | $240/mo | $450/mo | $360/mo |
| 10M req/day, 500 in / 100 out | $2,250/mo | $1,200/mo | $2,250/mo | $1,800/mo |
At high volume, GPT-oss 20B saves $1,050/mo over GPT-4o mini for the same workload. That's real money for startups burning through API budgets.
When to Use GPT-oss
- High-volume, low-complexity tasks: Classification, routing, simple Q&A, content moderation
- Batch processing: When you need to process millions of documents cheaply
- Prototyping: Test ideas without burning through expensive API credits
- Self-hosting option: As open-weight models, you can also self-host for even lower costs at scale
When to Avoid GPT-oss
- Complex reasoning: The 20B model especially may underperform on multi-step logic
- Long context: 128K context is adequate but not competitive with Gemini's 1M or DeepSeek V4's 1M
- Code generation: GPT-5.3 Codex or Claude Sonnet 4 will produce better code
- Critical applications: For production systems where quality matters more than cost, stick with proven models
Calculate your exact savings: Compare GPT-oss against your current model to see how much you'd save.
Try the APIpulse CalculatorThe Bigger Picture
OpenAI entering the open-weight market signals that the budget LLM tier is getting crowded. With GPT-oss, Llama 4, DeepSeek V4, Mistral Small, and Gemini Flash all competing under $0.20/1M input tokens, developers have more affordable options than ever.
The real winner isn't any single model — it's the downward pressure on pricing across the board. Use APIpulse to find the cheapest option for your specific workload.