Embedding Models Pricing - March 2026

Embedding API costs compared for OpenAI, Cohere, Voyage AI, Google, Mistral, and Jina - normalized to price per million tokens with MTEB quality scores.

Cheapest: Mistral Embed Best Value: OpenAI text-embedding-3-small Updated weekly
Embedding Models Pricing - March 2026

TL;DR

  • Cheapest commercial embedding: Mistral Embed at $0.01/MTok, with solid quality for general retrieval
  • Best value: OpenAI text-embedding-3-small at $0.02/MTok balances cost and MTEB performance
  • Highest quality: Google Gemini Embedding 001 tops the English MTEB at 68.32, priced at free tier or $0.20/MTok (Gemini Embedding 2)
  • Open-source options (NV-Embed-v2, Qwen3-Embedding) are free and competitive on benchmarks - you only pay for compute

Quick Verdict

For most RAG pipelines and search applications, OpenAI's text-embedding-3-small at $0.02 per million tokens hits the sweet spot. It scores well on MTEB retrieval benchmarks and costs almost nothing at scale. If you need maximum retrieval accuracy and don't mind paying more, Voyage AI's voyage-3.5 ($0.06/MTok) or Cohere's Embed 4 ($0.12/MTok) deliver measurably better results on domain-specific tasks. Teams already on Google Cloud should look at Gemini Embedding 001 - it leads the English MTEB leaderboard and the free tier is generous. For a deeper dive into how embeddings work and where they fit in your stack, see our guide to AI embeddings and the MTEB leaderboard.

Normalized Pricing Table

All prices per million tokens (MTok). Embeddings are input-only - there are no output token costs. MTEB scores from the latest English leaderboard where available. Sorted by price.

ModelProviderPrice (/1M tokens)DimensionsMax TokensMTEB ScoreNotes
Mistral EmbedMistral$0.011,0248,192~63Budget general-purpose
text-embedding-3-smallOpenAI$0.021,5368,19162.3Best budget pick
voyage-3.5-liteVoyage AI$0.021,02432,000~64Long context budget
Gemini Embedding 001GoogleFree*3,0728,19268.3*Free tier, paid via Vertex
voyage-3.5Voyage AI$0.061,02432,000~67Strong retrieval
Codestral EmbedMistral$0.151,53632,768N/ACode-specialized
Cohere Embed v3Cohere$0.101,02451264.5Text only
Cohere Embed 4Cohere$0.121,5368,192~66Multimodal (text + images)
text-embedding-3-largeOpenAI$0.133,0728,19164.6Higher quality, 6.5x cost
voyage-code-3Voyage AI$0.181,02432,000N/ACode retrieval specialist
Gemini Embedding 2Google$0.203,0728,192N/AMultimodal (text, images, video)
Jina Embeddings v4Jina AIContact sales4,09632,768~67Multimodal, 3.8B params

Open-Source Alternatives (Self-Hosted)

These models are free to download and run. Your only cost is compute.

ModelParametersDimensionsMTEB ScoreNotes
NV-Embed-v27B4,09672.3NVIDIA, top English MTEB
Qwen3-Embedding-8B8B2,04870.6Strong multilingual
BGE-M3568M1,02463.0Lightweight, multilingual
Llama-Embed-Nemotron-8B8B4,096N/ATop multilingual MTEB
EmbeddingGemma-300M300M768~60Ultra-lightweight, on-device

Running NV-Embed-v2 on a single A100 costs roughly $1-2/hour on major cloud providers. At typical throughput (~5,000 embeddings per second), that translates to about $0.001 per million tokens - 10-20x cheaper than the cheapest commercial API. The tradeoff is operational complexity. For guidance on self-hosting, see our guide to running open-source models locally.

Hidden Costs

Dimension Count Affects Storage

A 3,072-dimension embedding (OpenAI large, Gemini) takes 3x the vector database storage of a 1,024-dimension one (Cohere, Voyage). At 100 million documents, that's the difference between ~1.2 TB and ~400 GB in Pinecone or Qdrant. Storage costs can dwarf embedding API costs at scale.

Short Context Limits

Cohere Embed v3 maxes out at 512 tokens per chunk. If your documents are longer, you'll need more chunks, more embeddings, and higher vector storage. Newer models (Voyage 3.5, Jina v4, Codestral Embed) support 32K tokens, reducing chunk counts by 60x.

Batch API Discounts

OpenAI offers 50% off embeddings via batch API ($0.01/MTok for small, $0.065/MTok for large). Voyage AI gives 33% off through their batch endpoint. Cohere and Mistral don't currently offer embedding batch discounts. For pipelines that don't need real-time results, batch pricing is a significant lever.

Re-embedding Costs

Switching models means re-embedding your entire corpus. At 1 billion tokens, that's $20 with text-embedding-3-small or $120 with Cohere Embed 4. Plan model selection carefully - the cheapest model isn't always cheapest long-term if you outgrow it.

Free Tier Comparison

ProviderFree OfferingTokens IncludedExpiration
Google (Gemini)Free tier for Gemini Embedding 0011,500 RPDNone
Voyage AIFree credits200M tokensNone
Jina AIFree trial10M tokensNone
OpenAI$5 trial credits~250M tokens (3-small)3 months
MistralFree tier (some models)Limited RPMNone
CohereTrial key1,000 calls/monthNone

Voyage AI's free tier is the most generous for embeddings specifically - 200 million tokens lets you embed a sizable corpus before paying anything. Google's free Gemini Embedding 001 access is also strong, though rate-limited to 1,500 requests per day.

Price History

  • Mar 2026 - Google launched Gemini Embedding 2 (multimodal) at $0.20/MTok. Text-embedding-004 deprecated.

  • Jan 2026 - Cohere released Embed 4 (multimodal) at $0.12/MTok, up from Embed v3's $0.10. Image embedding priced at $0.47/MTok.

  • May 2025 - Mistral launched Codestral Embed at $0.15/MTok for code-specialized retrieval.

  • Jan 2025 - OpenAI dropped text-embedding-3-small to $0.02/MTok (from $0.02 at launch, holding steady).

  • Nov 2024 - Voyage AI released voyage-3.5 at $0.06/MTok, a meaningful quality jump over voyage-2 at the same price.

Embedding prices have been remarkably stable compared to LLM API prices. The main movement is in capability (multimodal support, longer context) rather than cost. Open-source models are the real disruptor - NV-Embed-v2 and Qwen3-Embedding now match or exceed commercial models on MTEB while costing only compute.

FAQ

Which embedding model is cheapest per million tokens?

Mistral Embed at $0.01/MTok is the cheapest commercial option. OpenAI's text-embedding-3-small at $0.02/MTok offers better quality for only $0.01 more. Self-hosted models can drop to ~$0.001/MTok on cloud GPUs.

What's the best embedding model for RAG?

For most RAG use cases, OpenAI text-embedding-3-small or Voyage voyage-3.5 offer the best quality-to-cost ratio. Google's Gemini Embedding 001 leads MTEB but requires the Google Cloud ecosystem.

Are open-source embedding models good enough?

Yes. NV-Embed-v2 scores 72.3 on English MTEB, beating every commercial API. The catch is you need GPU infrastructure to run it. For teams with existing GPU resources, open-source is the clear winner.

How much does it cost to embed 1 million documents?

Assuming 500 tokens per document (about 375 words): 500M total tokens. At $0.02/MTok (OpenAI small), that's $10. At $0.12/MTok (Cohere Embed 4), that's $60. Self-hosted NV-Embed-v2 would cost roughly $0.50 in compute.

Should I pick a model based on MTEB score or price?

Neither alone. Test with your actual data. MTEB measures average performance across dozens of tasks. Your retrieval accuracy on your specific domain can differ by 10+ points from the benchmark average. Run A/B tests before committing.


Sources:

✓ Last verified March 11, 2026

Embedding Models Pricing - March 2026
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.