AI API Cost Per Request: The Metric Developers Actually Need

Token-based pricing is confusing. Here's how to think about LLM costs in terms developers actually use — cost per API call.

Published: May 13, 2026 · 6 min read

When you're budgeting for an AI-powered feature, nobody asks "how many tokens will this use?" They ask: "How much does each API call cost?"

Yet every LLM provider prices by tokens — millions of them. That's like a gas station selling fuel by the milliliter. Technically accurate, but not how people think.

Let's fix that.

What Is Cost Per Request?

Cost per request is the total price of a single API call to an LLM. It's calculated from two things:

Input tokens — what you send to the model (your prompt, system instructions, context)
Output tokens — what the model generates back (the response)

The formula is straightforward:

cost = (input_tokens / 1,000,000) × input_price + (output_tokens / 1,000,000) × output_price

For example, a typical chat request with 500 input tokens and 300 output tokens on GPT-4o mini ($0.15/$0.60 per 1M):

$0.00026

per request — or $0.26 per 1,000 requests

That's the number that matters when you're planning a feature. Not "tokens per million" — but "dollars per call."

Cost Per Request Across 33 Models

Here's what a typical chat request (500 input / 300 output tokens) costs across popular models:

Model	Provider	Cost per Request	Cost per 1K
Gemini 2.0 Flash Lite	Google	$0.00013	$0.13
GPT-oss 20B	OpenAI	$0.00015	$0.15
Llama 3.1 8B	Meta (Together.ai)	$0.00011	$0.11
Gemini 2.0 Flash	Google	$0.00017	$0.17
DeepSeek V4 Flash	DeepSeek	$0.00015	$0.15
GPT-4o mini	OpenAI	$0.00026	$0.26
Mistral Small 4	Mistral	$0.00026	$0.26
DeepSeek V4 Pro	DeepSeek	$0.00048	$0.48
Claude Haiku 4.5	Anthropic	$0.00200	$2.00
GPT-5 mini	OpenAI	$0.00073	$0.73
GPT-5	OpenAI	$0.00363	$3.63
Claude Sonnet 4	Anthropic	$0.00600	$6.00
Gemini 2.5 Pro	Google	$0.00363	$3.63
GPT-5.5	OpenAI	$0.01150	$11.50
Claude 4 Opus	Anthropic	$0.03000	$30.00

The range is massive: $0.00011 to $0.03000 per request — a 270x difference between the cheapest and most expensive model for the same workload.

Why Cost Per Request Matters More Than Token Pricing

1. It's what you actually budget against

When your PM asks "how much will this chatbot feature cost?", the answer is never "3 million tokens per month." It's "$50/month at 1,000 requests per day" or "$0.05 per conversation."

2. It makes model comparison intuitive

Is GPT-5 mini worth the premium over GPT-4o mini? At $0.00073 vs $0.00026 per request, that's a 2.8x cost increase. Now you can decide if the quality improvement justifies 2.8x the price.

3. It reveals hidden costs in your architecture

If your RAG pipeline makes 3 LLM calls per user query (classify → retrieve → generate), your true cost per user interaction is 3x the single-request cost. Token-based thinking hides this; request-based thinking exposes it.

4. It scales linearly with users

1,000 users × 5 requests each = 5,000 requests. Multiply by cost per request. Done. No need to estimate token distributions.

Common Request Types and Their Costs

Different workloads have very different token profiles. Here are typical patterns:

Request Type	Input Tokens	Output Tokens	GPT-4o mini	Claude Sonnet 4
Chat message	500	300	$0.00026	$0.00600
Code generation	2,000	1,500	$0.00120	$0.02850
Document analysis	4,000	500	$0.00090	$0.01950
RAG query	3,000	600	$0.00081	$0.01800
Content writing	500	2,000	$0.00128	$0.03150

A code generation request costs 4-5x more than a simple chat message because it produces many more output tokens. This is obvious once you see it in request-level terms — but easy to miss when you're thinking in raw token counts.

How to Calculate Your Cost Per Request

Three steps:

Measure your typical request. Log input and output token counts for a sample of real requests. Find the median (not average — outliers skew it).
Look up the model's pricing. Input price per 1M tokens, output price per 1M tokens.
Apply the formula. (median_input / 1M × input_price) + (median_output / 1M × output_price)

Or just use our cost calculator — enter your typical token counts and it shows cost per request, cost per 1K requests, and monthly total instantly.

Cost Optimization Strategies (In Request Terms)

Reduce input tokens per request

Shorter system prompts, smarter context selection, and prompt compression all reduce the input side. Cutting input from 3,000 to 1,500 tokens on Claude Sonnet 4 saves $0.0045 per request — that's $4.50 per 1,000 requests.

Limit output tokens per request

Set max_tokens appropriately. If your chatbot typically needs 200 tokens, don't leave the default at 4,096. The model stops generating when it's done, but a lower limit prevents runaway responses.

Use the cheapest model that works

Not every request needs GPT-5. Route simple queries to GPT-4o mini ($0.00026/request) and complex ones to GPT-5 ($0.00363/request). A smart routing strategy can cut costs by 60-80%.

Batch similar requests

If you're processing 100 documents, batch them into fewer API calls with multiple documents per prompt. Fewer requests = fewer per-request overhead costs.

Calculate your exact cost per request

Enter your typical token counts and see cost per request, per 1K requests, and monthly total across 33 models.

Open the Calculator

The Bottom Line

Token-based pricing is how providers charge. But request-based thinking is how engineers budget.

When you know that each API call costs $0.00026 on GPT-4o mini or $0.006 on Claude Sonnet 4, you can make real architectural decisions: which model to use, how many calls to make per user interaction, whether to cache responses, and when to batch requests.

The 270x cost difference between the cheapest and most expensive model isn't visible in token pricing tables. It's crystal clear when you see it as cost per request.