Embedding Models Pricing - March 2026
Embedding API costs compared for OpenAI, Cohere, Voyage AI, Google, Mistral, and Jina - normalized to price per million tokens with MTEB quality scores.

TL;DR
- Cheapest commercial embedding: Mistral Embed at $0.01/MTok, with solid quality for general retrieval
- Best value: OpenAI text-embedding-3-small at $0.02/MTok balances cost and MTEB performance
- Highest quality: Google Gemini Embedding 001 tops the English MTEB at 68.32, priced at free tier or $0.20/MTok (Gemini Embedding 2)
- Open-source options (NV-Embed-v2, Qwen3-Embedding) are free and competitive on benchmarks - you only pay for compute
Quick Verdict
For most RAG pipelines and search applications, OpenAI's text-embedding-3-small at $0.02 per million tokens hits the sweet spot. It scores well on MTEB retrieval benchmarks and costs almost nothing at scale. If you need maximum retrieval accuracy and don't mind paying more, Voyage AI's voyage-3.5 ($0.06/MTok) or Cohere's Embed 4 ($0.12/MTok) deliver measurably better results on domain-specific tasks. Teams already on Google Cloud should look at Gemini Embedding 001 - it leads the English MTEB leaderboard and the free tier is generous. For a deeper dive into how embeddings work and where they fit in your stack, see our guide to AI embeddings and the MTEB leaderboard.
Normalized Pricing Table
All prices per million tokens (MTok). Embeddings are input-only - there are no output token costs. MTEB scores from the latest English leaderboard where available. Sorted by price.
| Model | Provider | Price (/1M tokens) | Dimensions | Max Tokens | MTEB Score | Notes |
|---|---|---|---|---|---|---|
| Mistral Embed | Mistral | $0.01 | 1,024 | 8,192 | ~63 | Budget general-purpose |
| text-embedding-3-small | OpenAI | $0.02 | 1,536 | 8,191 | 62.3 | Best budget pick |
| voyage-3.5-lite | Voyage AI | $0.02 | 1,024 | 32,000 | ~64 | Long context budget |
| Gemini Embedding 001 | Free* | 3,072 | 8,192 | 68.3 | *Free tier, paid via Vertex | |
| voyage-3.5 | Voyage AI | $0.06 | 1,024 | 32,000 | ~67 | Strong retrieval |
| Codestral Embed | Mistral | $0.15 | 1,536 | 32,768 | N/A | Code-specialized |
| Cohere Embed v3 | Cohere | $0.10 | 1,024 | 512 | 64.5 | Text only |
| Cohere Embed 4 | Cohere | $0.12 | 1,536 | 8,192 | ~66 | Multimodal (text + images) |
| text-embedding-3-large | OpenAI | $0.13 | 3,072 | 8,191 | 64.6 | Higher quality, 6.5x cost |
| voyage-code-3 | Voyage AI | $0.18 | 1,024 | 32,000 | N/A | Code retrieval specialist |
| Gemini Embedding 2 | $0.20 | 3,072 | 8,192 | N/A | Multimodal (text, images, video) | |
| Jina Embeddings v4 | Jina AI | Contact sales | 4,096 | 32,768 | ~67 | Multimodal, 3.8B params |
Open-Source Alternatives (Self-Hosted)
These models are free to download and run. Your only cost is compute.
| Model | Parameters | Dimensions | MTEB Score | Notes |
|---|---|---|---|---|
| NV-Embed-v2 | 7B | 4,096 | 72.3 | NVIDIA, top English MTEB |
| Qwen3-Embedding-8B | 8B | 2,048 | 70.6 | Strong multilingual |
| BGE-M3 | 568M | 1,024 | 63.0 | Lightweight, multilingual |
| Llama-Embed-Nemotron-8B | 8B | 4,096 | N/A | Top multilingual MTEB |
| EmbeddingGemma-300M | 300M | 768 | ~60 | Ultra-lightweight, on-device |
Running NV-Embed-v2 on a single A100 costs roughly $1-2/hour on major cloud providers. At typical throughput (~5,000 embeddings per second), that translates to about $0.001 per million tokens - 10-20x cheaper than the cheapest commercial API. The tradeoff is operational complexity. For guidance on self-hosting, see our guide to running open-source models locally.
Hidden Costs
Dimension Count Affects Storage
A 3,072-dimension embedding (OpenAI large, Gemini) takes 3x the vector database storage of a 1,024-dimension one (Cohere, Voyage). At 100 million documents, that's the difference between ~1.2 TB and ~400 GB in Pinecone or Qdrant. Storage costs can dwarf embedding API costs at scale.
Short Context Limits
Cohere Embed v3 maxes out at 512 tokens per chunk. If your documents are longer, you'll need more chunks, more embeddings, and higher vector storage. Newer models (Voyage 3.5, Jina v4, Codestral Embed) support 32K tokens, reducing chunk counts by 60x.
Batch API Discounts
OpenAI offers 50% off embeddings via batch API ($0.01/MTok for small, $0.065/MTok for large). Voyage AI gives 33% off through their batch endpoint. Cohere and Mistral don't currently offer embedding batch discounts. For pipelines that don't need real-time results, batch pricing is a significant lever.
Re-embedding Costs
Switching models means re-embedding your entire corpus. At 1 billion tokens, that's $20 with text-embedding-3-small or $120 with Cohere Embed 4. Plan model selection carefully - the cheapest model isn't always cheapest long-term if you outgrow it.
Free Tier Comparison
| Provider | Free Offering | Tokens Included | Expiration |
|---|---|---|---|
| Google (Gemini) | Free tier for Gemini Embedding 001 | 1,500 RPD | None |
| Voyage AI | Free credits | 200M tokens | None |
| Jina AI | Free trial | 10M tokens | None |
| OpenAI | $5 trial credits | ~250M tokens (3-small) | 3 months |
| Mistral | Free tier (some models) | Limited RPM | None |
| Cohere | Trial key | 1,000 calls/month | None |
Voyage AI's free tier is the most generous for embeddings specifically - 200 million tokens lets you embed a sizable corpus before paying anything. Google's free Gemini Embedding 001 access is also strong, though rate-limited to 1,500 requests per day.
Price History
Mar 2026 - Google launched Gemini Embedding 2 (multimodal) at $0.20/MTok. Text-embedding-004 deprecated.
Jan 2026 - Cohere released Embed 4 (multimodal) at $0.12/MTok, up from Embed v3's $0.10. Image embedding priced at $0.47/MTok.
May 2025 - Mistral launched Codestral Embed at $0.15/MTok for code-specialized retrieval.
Jan 2025 - OpenAI dropped text-embedding-3-small to $0.02/MTok (from $0.02 at launch, holding steady).
Nov 2024 - Voyage AI released voyage-3.5 at $0.06/MTok, a meaningful quality jump over voyage-2 at the same price.
Embedding prices have been remarkably stable compared to LLM API prices. The main movement is in capability (multimodal support, longer context) rather than cost. Open-source models are the real disruptor - NV-Embed-v2 and Qwen3-Embedding now match or exceed commercial models on MTEB while costing only compute.
FAQ
Which embedding model is cheapest per million tokens?
Mistral Embed at $0.01/MTok is the cheapest commercial option. OpenAI's text-embedding-3-small at $0.02/MTok offers better quality for only $0.01 more. Self-hosted models can drop to ~$0.001/MTok on cloud GPUs.
What's the best embedding model for RAG?
For most RAG use cases, OpenAI text-embedding-3-small or Voyage voyage-3.5 offer the best quality-to-cost ratio. Google's Gemini Embedding 001 leads MTEB but requires the Google Cloud ecosystem.
Are open-source embedding models good enough?
Yes. NV-Embed-v2 scores 72.3 on English MTEB, beating every commercial API. The catch is you need GPU infrastructure to run it. For teams with existing GPU resources, open-source is the clear winner.
How much does it cost to embed 1 million documents?
Assuming 500 tokens per document (about 375 words): 500M total tokens. At $0.02/MTok (OpenAI small), that's $10. At $0.12/MTok (Cohere Embed 4), that's $60. Self-hosted NV-Embed-v2 would cost roughly $0.50 in compute.
Should I pick a model based on MTEB score or price?
Neither alone. Test with your actual data. MTEB measures average performance across dozens of tasks. Your retrieval accuracy on your specific domain can differ by 10+ points from the benchmark average. Run A/B tests before committing.
Sources:
✓ Last verified March 11, 2026
