Embedding API Cost Calculator: Estimate Your RAG Pipeline Costs
Building a RAG pipeline? You're probably budgeting for the LLM generation costs โ but what about embedding? The embedding step is often overlooked, yet it's a recurring cost that scales with every document you index and every query you run.
We built a free Embedding API Cost Calculator that estimates your embedding spend across OpenAI, Cohere, and Google models. Here's what you need to know.
Try the Embedding API Cost Calculator โ
Calculate Your Embedding Costs โEmbedding Model Pricing at a Glance
| Model | Provider | Price/1M Tokens | Dimensions | Max Tokens |
|---|---|---|---|---|
| text-embedding-3-small | OpenAI | $0.02 | 1,536 | 8,191 |
| text-embedding-3-large | OpenAI | $0.13 | 3,072 | 8,191 |
| text-embedding-ada-002 | OpenAI | $0.10 | 1,536 | 8,191 |
| embed-v3 | Cohere | $0.10 | 1,024 | 512 |
| embed-multilingual-v3 | Cohere | $0.10 | 1,024 | 512 |
| text-embedding-004 | Free* | 768 | 2,048 |
*Google offers a free tier for low-volume use. Pay-as-you-go pricing applies at higher volumes.
Real-World Embedding Cost Examples
Let's look at what it actually costs to embed real workloads:
| Scenario | Documents | Tokens | OpenAI Small | OpenAI Large | Cohere v3 |
|---|---|---|---|---|---|
| Small knowledge base | 1,000 | 500K | $0.01 | $0.07 | $0.05 |
| Medium documentation | 10,000 | 5M | $0.10 | $0.65 | $0.50 |
| Large enterprise corpus | 100,000 | 50M | $1.00 | $6.50 | $5.00 |
| Massive document store | 1,000,000 | 500M | $10.00 | $65.00 | $50.00 |
Assumes 500 tokens per document (~375 words). These are one-time indexing costs.
Embedding vs Generation: Where the Money Goes
In a typical RAG pipeline, embedding costs are only 5-15% of total API spend. The LLM generation call dominates costs. Here's a real breakdown for 1,000 RAG queries per day:
| Component | Monthly Cost | % of Total |
|---|---|---|
| Embedding (queries only) | $0.06 | ~3% |
| Generation input (GPT-5 mini) | $1.50 | ~75% |
| Generation output (GPT-5 mini) | $3.00 | ~22% |
| Total | $4.56 | 100% |
But at scale with expensive generation models, embedding becomes more significant. With Claude Sonnet 4.6 ($3/$15) at 10K queries/day, embedding is still ~2% while generation hits $6,000+/month.
How to Reduce Embedding Costs
1. Use text-embedding-3-small
At $0.02/1M tokens, OpenAI's small model is 85% cheaper than large with 90% of the quality. Start here and upgrade only if retrieval quality is insufficient for your use case.
2. Reduce Dimensions
text-embedding-3-large supports dimension reduction: 256d, 512d, 1024d, 1536d, or 3072d. Using 1024d instead of 3072d reduces storage costs by 67% with minimal quality loss. Most RAG applications perform well at 1024d.
3. Optimize Chunk Size
Smaller chunks mean more documents, which means more embedding calls. But smaller chunks also improve retrieval accuracy. The sweet spot is 256-512 tokens per chunk โ small enough for precise retrieval, large enough to keep embedding costs reasonable.
4. Batch API Calls
Embed up to 2,048 inputs per request. Batching reduces API overhead and can improve throughput by 10-20x compared to single-document embedding.
5. Cache Embeddings
Store embeddings in your vector database. Only re-embed when documents change. For static knowledge bases, this eliminates recurring embedding costs entirely after the initial indexing.
When to Upgrade Your Embedding Model
- Stay with text-embedding-3-small if: English-only, cost-sensitive, good enough retrieval quality
- Upgrade to text-embedding-3-large if: retrieval quality matters (legal, medical, financial), high-value queries, need 3072d for downstream tasks
- Switch to Cohere embed-v3 if: multilingual requirements, need 100+ language support, or building for global audiences
- Use Google text-embedding-004 if: prototyping, low volume, or already on GCP with free tier credits
The Bottom Line
Embedding is one of the cheapest parts of a RAG pipeline โ but it's not free. For most projects, OpenAI text-embedding-3-small at $0.02/1M tokens is the clear winner on value. Use our Embedding API Cost Calculator to estimate your exact costs, and check the RAG Cost Calculator for full pipeline cost estimation.
Estimate your full RAG pipeline cost โ embedding + generation together.
Try RAG Cost Calculator โRelated Tools
- Embedding API Cost Calculator โ Compare embedding models side by side
- RAG Cost Calculator โ Full RAG pipeline cost estimation
- AI API Cost Calculator โ Compare generation model costs
- Token Estimator โ Count tokens in your text
- Cost Explorer โ See all 34 models ranked by cost