Blog · Jun 7, 2026
Prompt Engineering to Reduce AI API Costs by 50%
8 techniques that actually work — with real examples using GPT-5, Claude Sonnet 4.6, and DeepSeek V3.2.
Most developers optimize their AI stack by switching to cheaper models. That's the obvious move — but it's not the biggest lever. Prompt engineering alone can cut your API costs by 30-70% without changing models, infrastructure, or architecture.
Here are 8 techniques we've seen work across hundreds of production deployments. Each one includes a before/after example with real token counts and cost calculations.
1. Output Length Control — Save 30-50%
The single biggest cost driver is output tokens. Most models charge 3-10x more for output than input. If your prompt generates 500 tokens when you only need 100, you're paying 5x too much.
// Before: verbose prompt → 500 output tokens
"Analyze this customer review and provide a detailed sentiment analysis with explanation, confidence score, key themes, and actionable recommendations for the product team."
// After: focused prompt → 80 output tokens
"Classify this review as positive/negative/neutral. Reply with JSON: {sentiment, confidence: 0-1, one_word_reason}"
Cost impact on GPT-5 ($10/M output): 500 tokens → 80 tokens = $0.0042 savings per request. At 10K requests/day, that's $1,260/month saved.
2. System Prompts Over Few-Shot — Save 10-20%
Few-shot examples eat input tokens fast. Each example is 50-200 tokens. Using 5 examples costs 250-1,000 input tokens per request. A well-written system prompt achieves the same result in 50-100 tokens.
// Before: 5 few-shot examples → 800 input tokens
"Classify the sentiment. Examples: 'Great product!' → positive. 'Terrible service' → negative. 'It's okay' → neutral. 'Love it' → positive. 'Worst experience' → negative."
// After: system prompt → 120 input tokens
"Classify text sentiment as positive, negative, or neutral. Output single word."
3. Structured Output Formats — Save 20-40%
When you ask for natural language explanations, the model generates verbose responses. When you ask for JSON or structured data, responses are 2-5x shorter and more consistent.
// Before: natural language → 200 output tokens
"Tell me about this product's features and pricing."
// After: structured output → 60 output tokens
"Extract product info as JSON: {name, features: [], price, currency}. No explanation."
4. Prompt Caching (Repeat Prefixes) — Save 50-90%
OpenAI, Anthropic, and Google all support prompt caching. If your system prompt + context is the same across requests, the cached portion costs 50-90% less. The key is keeping your prefix consistent.
How it works: If your system prompt is 500 tokens and you send 1,000 requests/day with the same prefix, the cached portion (500 tokens) costs $0.000015/token instead of $0.00015/token on GPT-5. That's $20/month saved just from caching.
5. Model Routing — Save 50-80%
Not every request needs GPT-5 or Claude Opus. Route simple tasks (classification, extraction, formatting) to budget models, and reserve premium models for complex reasoning.
| Task Type | Instead Of | Use | Savings |
|---|---|---|---|
| Sentiment classification | GPT-5 ($1.25/$10) | DeepSeek V3.2 ($0.23/$0.34) | 82% |
| Email categorization | Claude Sonnet 4.6 ($3/$15) | GPT-4o mini ($0.15/$0.60) | 95% |
| Data extraction | GPT-5 ($1.25/$10) | DeepSeek V4 Flash ($0.14/$0.28) | 89% |
| Complex reasoning | GPT-5 ($1.25/$10) | Keep GPT-5 | — |
6. Batch Processing — Save 50%
OpenAI's Batch API costs 50% less than the real-time API. If your use case can tolerate 24-hour turnaround (data processing, report generation, content moderation), batch is a no-brainer.
Cost impact: GPT-5 drops from $1.25/$10 to $0.625/$5 per 1M tokens. For a workload processing 10M tokens/day, that's $187/month saved.
7. Response Length Limits — Save 20-40%
Most APIs support a max_tokens parameter. Setting a reasonable limit prevents runaway output costs. If you only need 200 tokens, don't let the model generate 2,000.
Pro tip: Set max_tokens to 1.5x your expected output length. This catches edge cases without wasting tokens on rambling responses.
8. Prompt Compression — Save 30-60%
Remove filler words, use abbreviations, and compress instructions. The model understands compressed prompts just as well.
// Before: 85 tokens
"I would like you to please analyze the following customer review and provide me with a detailed sentiment analysis. Please include the overall sentiment, a confidence score from 0 to 1, and a brief explanation of why you chose that sentiment."
// After: 32 tokens (62% reduction)
"Analyze review sentiment. Output: {sentiment, confidence: 0-1, reason}"
Combined Impact: Real-World Example
Let's say you're running a customer support chatbot with GPT-5, processing 5,000 requests/day:
| Metric | Before | After |
|---|---|---|
| Input tokens/request | 800 | 350 |
| Output tokens/request | 400 | 120 |
| Model | GPT-5 | DeepSeek V3.2 |
| Daily cost | $25.00 | $0.63 |
| Monthly cost | $750 | $18.90 |
Total savings: $731.10/month (97.5% reduction) — from prompt optimization + model routing combined.
Calculate your exact savings
Use our cost calculator to compare models and estimate how much you'd save with these techniques.
Open Cost CalculatorQuick Reference: Which Technique Saves Most?
Highest Impact
Model routing (50-80%) + output control (30-50%)
Easiest to Implement
Response length limits + structured output
Most Overlooked
Prompt caching + batch processing
Best ROI
Prompt compression (30-60% with 5 min effort)
Pricing data verified Jun 7, 2026. Use our cost calculator to estimate savings for your specific workload. See also: 12 Ways to Reduce AI API Costs and AI API Caching Strategies.