How much does it cost to embed 1 million documents?

At 500 tokens per document, embedding 1M documents costs: OpenAI small ($0.02/1M): $10, OpenAI large ($0.13/1M): $65, Cohere v3 ($0.10/1M): $50, Google v4: Free tier or ~$0.50. These are one-time indexing costs.

What percentage of RAG costs are embedding?

In a typical RAG pipeline, embedding costs are 5-15% of total API spend. Generation (the LLM call) dominates costs. For 1K queries/day with GPT-5 mini, embedding is ~3% and generation is ~97% of total cost.

How do I reduce embedding costs?

5 ways: 1) Use text-embedding-3-small ($0.02/1M) instead of large ($0.13/1M) — 85% savings. 2) Reduce dimensions (1024d instead of 3072d). 3) Optimize chunk size (256-512 tokens). 4) Batch API calls (2048 inputs per request). 5) Cache embeddings to avoid re-embedding.

Which embedding model is best for RAG?

For English RAG: OpenAI text-embedding-3-large ($0.13/1M) for best quality, text-embedding-3-small ($0.02/1M) for best value. For multilingual RAG: Cohere embed-v3 ($0.10/1M) supports 100+ languages. For prototyping: Google text-embedding-004 has a free tier.

How many tokens is a typical document?

A typical document is 300-700 tokens. A 500-word article is ~500 tokens. A tweet is ~30 tokens. A PDF page is ~500 tokens. For RAG, documents are typically chunked into 256-1024 token segments for optimal retrieval quality.

Embedding API Cost Calculator: Estimate RAG Pipeline Costs (2026)

Building a RAG pipeline? You're probably budgeting for the LLM generation costs — but what about embedding? The embedding step is often overlooked, yet it's a recurring cost that scales with every document you index and every query you run.

🚨 Claude 4 retired June 15: See all 67 alternatives, calculate your savings, and get migration code on our Claude 4 Migration Hub.

We built a free Embedding API Cost Calculator that estimates your embedding spend across OpenAI, Cohere, and Google models. Here's what you need to know.

Try the Embedding API Cost Calculator →

Calculate Your Embedding Costs →

Embedding Model Pricing at a Glance

Model	Provider	Price/1M Tokens	Dimensions	Max Tokens
text-embedding-3-small	OpenAI	$0.02	1,536	8,191
text-embedding-3-large	OpenAI	$0.13	3,072	8,191
text-embedding-ada-002	OpenAI	$0.10	1,536	8,191
embed-v3	Cohere	$0.10	1,024	512
embed-multilingual-v3	Cohere	$0.10	1,024	512
text-embedding-004	Google	Free*	768	2,048

*Google offers a free tier for low-volume use. Pay-as-you-go pricing applies at higher volumes.

Real-World Embedding Cost Examples

Let's look at what it actually costs to embed real workloads:

Scenario	Documents	Tokens	OpenAI Small	OpenAI Large	Cohere v3
Small knowledge base	1,000	500K	$0.01	$0.07	$0.05
Medium documentation	10,000	5M	$0.10	$0.65	$0.50
Large enterprise corpus	100,000	50M	$1.00	$6.50	$5.00
Massive document store	1,000,000	500M	$10.00	$65.00	$50.00

Assumes 500 tokens per document (~375 words). These are one-time indexing costs.

Embedding vs Generation: Where the Money Goes

In a typical RAG pipeline, embedding costs are only 5-15% of total API spend. The LLM generation call dominates costs. Here's a real breakdown for 1,000 RAG queries per day:

Component	Monthly Cost	% of Total
Embedding (queries only)	$0.06	~3%
Generation input (GPT-5 mini)	$1.50	~75%
Generation output (GPT-5 mini)	$3.00	~22%
Total	$4.56	100%

But at scale with expensive generation models, embedding becomes more significant. With Claude Sonnet 4.6 ($3/$15) at 10K queries/day, embedding is still ~2% while generation hits $6,000+/month.

How to Reduce Embedding Costs

1. Use text-embedding-3-small

At $0.02/1M tokens, OpenAI's small model is 85% cheaper than large with 90% of the quality. Start here and upgrade only if retrieval quality is insufficient for your use case.

2. Reduce Dimensions

text-embedding-3-large supports dimension reduction: 256d, 512d, 1024d, 1536d, or 3072d. Using 1024d instead of 3072d reduces storage costs by 67% with minimal quality loss. Most RAG applications perform well at 1024d.

3. Optimize Chunk Size

Smaller chunks mean more documents, which means more embedding calls. But smaller chunks also improve retrieval accuracy. The sweet spot is 256-512 tokens per chunk — small enough for precise retrieval, large enough to keep embedding costs reasonable.

4. Batch API Calls

Embed up to 2,048 inputs per request. Batching reduces API overhead and can improve throughput by 10-20x compared to single-document embedding.

5. Cache Embeddings

Store embeddings in your vector database. Only re-embed when documents change. For static knowledge bases, this eliminates recurring embedding costs entirely after the initial indexing.

When to Upgrade Your Embedding Model

Stay with text-embedding-3-small if: English-only, cost-sensitive, good enough retrieval quality
Upgrade to text-embedding-3-large if: retrieval quality matters (legal, medical, financial), high-value queries, need 3072d for downstream tasks
Switch to Cohere embed-v3 if: multilingual requirements, need 100+ language support, or building for global audiences
Use Google text-embedding-004 if: prototyping, low volume, or already on GCP with free tier credits

The Bottom Line

Embedding is one of the cheapest parts of a RAG pipeline — but it's not free. For most projects, OpenAI text-embedding-3-small at $0.02/1M tokens is the clear winner on value. Use our Embedding API Cost Calculator to estimate your exact costs, and check the RAG Cost Calculator for full pipeline cost estimation.

Estimate your full RAG pipeline cost — embedding + generation together.

Try RAG Cost Calculator →

Related Tools

Embedding API Cost Calculator — Compare embedding models side by side
RAG Cost Calculator — Full RAG pipeline cost estimation
AI API Cost Calculator — Compare generation model costs
Token Estimator — Count tokens in your text
Cost Explorer — See all 67 models ranked by cost

🎯 Rate Your API Setup in 30 Seconds

Get an A+ to F grade on your AI API costs. See how you compare and find cheaper alternatives instantly.

Get Your Cost Score →

📊 Generate Your Personalized API Cost Report

Select your model, enter your monthly spend, and get a custom savings report with cheaper alternatives — free, in 60 seconds.

Want to optimize your AI API costs?

APIpulse includes free cost comparisons, exports, and recommendations that can save you up to 40%.

Free Cost Audit →

💸 Looking for Sonnet 4.6 Alternatives?

5 models ranked by cost — some are 90% cheaper.

See 5 Sonnet 4.6 Alternatives →

🔧 Free Embeddable Pricing Widget

Add live AI API pricing to your docs, blog, or README with one script tag. 67 models, auto-updating.

Get the Free Widget → Free MCP Server →