AI API Cost Per Request: The Metric Developers Actually Need
Token-based pricing is confusing. Here's how to think about LLM costs in terms developers actually use — cost per API call.
Published: May 13, 2026 · 6 min read
When you're budgeting for an AI-powered feature, nobody asks "how many tokens will this use?" They ask: "How much does each API call cost?"
Yet every LLM provider prices by tokens — millions of them. That's like a gas station selling fuel by the milliliter. Technically accurate, but not how people think.
Let's fix that.
What Is Cost Per Request?
Cost per request is the total price of a single API call to an LLM. It's calculated from two things:
- Input tokens — what you send to the model (your prompt, system instructions, context)
- Output tokens — what the model generates back (the response)
The formula is straightforward:
cost = (input_tokens / 1,000,000) × input_price + (output_tokens / 1,000,000) × output_price
For example, a typical chat request with 500 input tokens and 300 output tokens on GPT-4o mini ($0.15/$0.60 per 1M):
That's the number that matters when you're planning a feature. Not "tokens per million" — but "dollars per call."
Cost Per Request Across 33 Models
Here's what a typical chat request (500 input / 300 output tokens) costs across popular models:
| Model | Provider | Cost per Request | Cost per 1K |
|---|---|---|---|
| Gemini 2.0 Flash Lite | $0.00013 | $0.13 | |
| GPT-oss 20B | OpenAI | $0.00015 | $0.15 |
| Llama 3.1 8B | Meta (Together.ai) | $0.00011 | $0.11 |
| Gemini 2.0 Flash | $0.00017 | $0.17 | |
| DeepSeek V4 Flash | DeepSeek | $0.00015 | $0.15 |
| GPT-4o mini | OpenAI | $0.00026 | $0.26 |
| Mistral Small 4 | Mistral | $0.00026 | $0.26 |
| DeepSeek V4 Pro | DeepSeek | $0.00048 | $0.48 |
| Claude Haiku 4.5 | Anthropic | $0.00200 | $2.00 |
| GPT-5 mini | OpenAI | $0.00073 | $0.73 |
| GPT-5 | OpenAI | $0.00363 | $3.63 |
| Claude Sonnet 4 | Anthropic | $0.00600 | $6.00 |
| Gemini 2.5 Pro | $0.00363 | $3.63 | |
| GPT-5.5 | OpenAI | $0.01150 | $11.50 |
| Claude 4 Opus | Anthropic | $0.03000 | $30.00 |
The range is massive: $0.00011 to $0.03000 per request — a 270x difference between the cheapest and most expensive model for the same workload.
Why Cost Per Request Matters More Than Token Pricing
1. It's what you actually budget against
When your PM asks "how much will this chatbot feature cost?", the answer is never "3 million tokens per month." It's "$50/month at 1,000 requests per day" or "$0.05 per conversation."
2. It makes model comparison intuitive
Is GPT-5 mini worth the premium over GPT-4o mini? At $0.00073 vs $0.00026 per request, that's a 2.8x cost increase. Now you can decide if the quality improvement justifies 2.8x the price.
3. It reveals hidden costs in your architecture
If your RAG pipeline makes 3 LLM calls per user query (classify → retrieve → generate), your true cost per user interaction is 3x the single-request cost. Token-based thinking hides this; request-based thinking exposes it.
4. It scales linearly with users
1,000 users × 5 requests each = 5,000 requests. Multiply by cost per request. Done. No need to estimate token distributions.
Common Request Types and Their Costs
Different workloads have very different token profiles. Here are typical patterns:
| Request Type | Input Tokens | Output Tokens | GPT-4o mini | Claude Sonnet 4 |
|---|---|---|---|---|
| Chat message | 500 | 300 | $0.00026 | $0.00600 |
| Code generation | 2,000 | 1,500 | $0.00120 | $0.02850 |
| Document analysis | 4,000 | 500 | $0.00090 | $0.01950 |
| RAG query | 3,000 | 600 | $0.00081 | $0.01800 |
| Content writing | 500 | 2,000 | $0.00128 | $0.03150 |
A code generation request costs 4-5x more than a simple chat message because it produces many more output tokens. This is obvious once you see it in request-level terms — but easy to miss when you're thinking in raw token counts.
How to Calculate Your Cost Per Request
Three steps:
- Measure your typical request. Log input and output token counts for a sample of real requests. Find the median (not average — outliers skew it).
- Look up the model's pricing. Input price per 1M tokens, output price per 1M tokens.
- Apply the formula.
(median_input / 1M × input_price) + (median_output / 1M × output_price)
Or just use our cost calculator — enter your typical token counts and it shows cost per request, cost per 1K requests, and monthly total instantly.
Cost Optimization Strategies (In Request Terms)
Reduce input tokens per request
Shorter system prompts, smarter context selection, and prompt compression all reduce the input side. Cutting input from 3,000 to 1,500 tokens on Claude Sonnet 4 saves $0.0045 per request — that's $4.50 per 1,000 requests.
Limit output tokens per request
Set max_tokens appropriately. If your chatbot typically needs 200 tokens, don't leave the default at 4,096. The model stops generating when it's done, but a lower limit prevents runaway responses.
Use the cheapest model that works
Not every request needs GPT-5. Route simple queries to GPT-4o mini ($0.00026/request) and complex ones to GPT-5 ($0.00363/request). A smart routing strategy can cut costs by 60-80%.
Batch similar requests
If you're processing 100 documents, batch them into fewer API calls with multiple documents per prompt. Fewer requests = fewer per-request overhead costs.
Calculate your exact cost per request
Enter your typical token counts and see cost per request, per 1K requests, and monthly total across 33 models.
Open the CalculatorThe Bottom Line
Token-based pricing is how providers charge. But request-based thinking is how engineers budget.
When you know that each API call costs $0.00026 on GPT-4o mini or $0.006 on Claude Sonnet 4, you can make real architectural decisions: which model to use, how many calls to make per user interaction, whether to cache responses, and when to batch requests.
The 270x cost difference between the cheapest and most expensive model isn't visible in token pricing tables. It's crystal clear when you see it as cost per request.