LLM API Pricing Glossary
Every term you need to understand LLM API pricing — from tokens to context windows to rate limits. Know what you're paying for.
Last updated: Jun 9, 2026
Quick Navigation
Token
A token is the fundamental unit of text that an AI model processes. It's a piece of a word — roughly 1 token ≈ 0.75 words in English, or about 4 characters. Common words like "the" and "and" are 1 token. Longer or uncommon words may be split into 2-3 tokens.
See also: Per 1M Tokens, Input vs Output Pricing
Input vs Output Pricing
LLM APIs charge separately for input tokens (text you send) and output tokens (text the model generates). Output tokens almost always cost more — typically 3-10x more than input — because generating text requires more computation than processing it.
See also: Tokens, Cost per Request
Per 1M Tokens (per Million Tokens)
The standard pricing unit for LLM APIs. Prices are quoted as cost per 1 million tokens. To calculate your cost: (tokens used ÷ 1,000,000) × price per 1M. This makes it easy to compare models — just look at the price per 1M tokens.
See also: Tokens, Cost per Request
Context Window
The maximum number of tokens a model can process in a single API call — including both your input (prompt) and the model's output (response). A 200K context window means you can send up to ~200K tokens total. Larger context windows let you process longer documents, bigger codebases, and more conversation history without splitting content.
See also: Max Output Tokens, Tokens
TPS (Tokens per Second)
The speed at which a model generates output tokens. Higher TPS means faster responses. TPS is affected by the model's size, the provider's infrastructure, and the current load. Some providers offer "turbo" or "fast" modes that increase TPS at a higher price.
See also: RPM, Rate Limits
RPM (Requests per Minute)
The maximum number of API calls you can make per minute. This is a rate limit imposed by the provider to prevent abuse and ensure fair usage. If you exceed RPM, you'll get a 429 (Too Many Requests) error. Higher-tier accounts or paid plans typically have higher RPM limits.
See also: TPM, Rate Limits
Rate Limits
Restrictions imposed by API providers on how many requests or tokens you can use within a given time period. Rate limits protect the provider's infrastructure and ensure fair usage across all customers. Common rate limit types include RPM (requests per minute), TPM (tokens per minute), and concurrent requests.
TPM (Tokens per Minute)
The maximum number of tokens you can process per minute across all your API calls. This includes both input and output tokens. TPM is often the more relevant limit for high-throughput applications because it accounts for the actual computational load.
See also: RPM, Rate Limits
Max Output Tokens
The maximum number of tokens a model can generate in a single response. This is separate from the context window — it's the output portion. If your max output is 8,192 tokens, the model can generate up to ~6,000 words in one response. Longer responses require multiple API calls or streaming.
See also: Context Window, Max Output Tokens
Pricing Tiers
Most providers offer different pricing tiers based on your usage volume or account type. Higher tiers typically offer lower per-token prices, higher rate limits, and access to premium features. Some providers also have free tiers with limited usage for testing and development.
See also: Rate Limits, Batch API
Cost per Request
The total cost of a single API call, calculated as: (input tokens × input price) + (output tokens × output price). This is the most practical metric for estimating your monthly costs — multiply cost per request by your expected number of requests.
See also: Per 1M Tokens, Input vs Output Pricing
Batch API
A discounted API tier for processing large volumes of requests asynchronously. Batch APIs typically offer 50% lower prices but process requests in the background (within hours, not seconds). Ideal for non-time-sensitive workloads like data processing, content generation, and analytics.
See also: Pricing Tiers, Rate Limits
Current Pricing at a Glance
Compare input and output pricing across popular models (per 1M tokens)
| Model | Provider | Input | Output | Context |
|---|
Calculate Your Exact Costs
Use our free calculator to estimate your monthly API spend across all 39 models from 10 providers.
Open Cost Calculator →