Embedding models don't get the same attention as chatbots or reasoning engines, but they're the backbone of every retrieval-augmented generation pipeline, every semantic search system, and most production AI applications that need to find relevant information quickly. Pick the wrong embedding model and your RAG app returns garbage. Pick an expensive one and your costs scale linearly with every document you index.

The Massive Text Embedding Benchmark (MTEB) remains the standard way to compare these models, and the leaderboard has shifted much since our last overall rankings. Google's Gemini Embedding 001 currently holds the top spot on the English MTEB leaderboard with an average score of 68.32. But the raw number doesn't tell the whole story - open-source models from Qwen and NVIDIA are closing the gap fast, and pricing varies by orders of magnitude between providers.

TL;DR

Gemini Embedding 001 leads the English MTEB leaderboard at 68.32, with a +5.09 gap over the next best model
NVIDIA's Llama-Embed-Nemotron-8B tops the multilingual MTEB and is fully open-weight, making it the strongest free option for global applications
OpenAI's text-embedding-3-small at $0.02/million tokens remains the best value for budget-conscious teams that don't need peak retrieval accuracy

What MTEB Actually Measures

Production embedding systems depend on both model quality and infrastructure choices. Source: unsplash.com

MTEB evaluates embedding models across eight task categories, each testing a different capability:

Retrieval - the most important task for RAG applications. Models embed queries and documents separately, then retrieval is scored using nDCG@10 (normalized discounted cumulative gain at rank 10). This is where most practitioners should focus.

Semantic Textual Similarity (STS) - measures how well cosine similarity between embeddings correlates with human judgments of sentence similarity, scored by Spearman correlation.

Classification - embeddings are fed to a simple linear classifier. If the embeddings capture meaning well, even a basic classifier should perform strongly.

Clustering - mini-batch k-means on the embeddings, assessed using v-measure. Tests whether the model creates natural groupings in vector space.

Pair Classification - binary predictions using distance thresholds between embedding pairs. Think duplicate detection or paraphrase identification.

Reranking - cosine similarity between query and document embeddings, scored by MAP and MRR. Tests whether the model can rank relevant documents above irrelevant ones.

Bitext Mining - finding matching sentence pairs across languages, scored by F1.

Summarization - comparing machine summaries to human references via embedding similarity.

The English MTEB leaderboard covers 56 datasets across these categories. The newer multilingual leaderboard (MMTEB) spans 131 tasks across 250+ languages and uses Borda count for aggregation, which rewards models that perform consistently well across many tasks rather than dominating a few.

One important caveat: MTEB scores are self-reported. Model providers submit their own results, and while the evaluation code is open source, there's no independent verification step. Keep that in mind when a vendor touts their MTEB rank in a press release.

Rankings: March 2026

Benchmark comparison data for embedding models Embedding model performance varies notably across retrieval, classification, and clustering tasks. Source: unsplash.com

Rank	Model	Provider	MTEB Avg	Retrieval	Dims	Max Tokens	Pricing (per 1M tokens)
1	Gemini Embedding 001	Google	68.32	67.71	3072	8192	~$0.004/1K chars
2	NV-Embed-v2	NVIDIA	72.31*	62.65	4096	32768	Free (open-weight)
3	Qwen3-Embedding-8B	Qwen/Alibaba	70.58**	-	4096	32768	Free (open-weight)
4	BGE-en-ICL	BAAI	71.24*	-	4096	32768	Free (open-weight)
5	GTE-Qwen2-7B-instruct	Alibaba	70.24*	-	3584	32768	Free (open-weight)
6	Voyage-3-large	Voyage AI	66.80	-	2048	32768	$0.06
7	Jina Embeddings v3	Jina AI	65.52	-	1024	8192	Free tier + paid
8	Cohere Embed v4	Cohere	65.20	-	1024	512	$0.12
9	text-embedding-3-large	OpenAI	64.60	-	3072	8191	$0.13
10	BGE-M3	BAAI	63.00	-	1024	8192	Free (open-weight)
11	Nomic Embed v1.5	Nomic AI	62.39	-	768	8192	$0.05
12	text-embedding-3-small	OpenAI	62.26	-	1536	8191	$0.02

*NV-Embed-v2 and BGE-en-ICL scores from legacy MTEB (56 tasks). **Qwen3-Embedding-8B score from MMTEB multilingual leaderboard. Cross-leaderboard comparisons should be treated as approximate.

A few things stand out immediately. The top five models by raw score are all either free and open-weight or very cheap. That wasn't true a year ago, when OpenAI and Cohere led the upper ranks. The open-source ecosystem has caught up and, on pure benchmark numbers, surpassed the commercial APIs.

But raw MTEB averages can be misleading. NV-Embed-v2 posts 72.31 overall, yet its retrieval score (62.65) trails Gemini Embedding 001's 67.71 by a wide margin. If retrieval is your primary use case - and for most RAG applications, it is - Gemini's lead is more meaningful than the overall average suggests.

Key Takeaways

Google's Quiet Dominance

Gemini Embedding 001 doesn't get much buzz compared to Google's flagship chat models, but its MTEB numbers are hard to argue with. A 68.32 average with a +5.09 lead over the second-place model on the refreshed English leaderboard is a major gap. The model supports Matryoshka Representation Learning, so you can truncate from 3072 dimensions down to 768 with minimal quality loss. Pricing through the Gemini API is effectively negligible at around $0.004 per 1,000 characters.

The catch: you're locked into Google's API. For teams already on Google Cloud, that's fine. For everyone else, vendor lock-in on your embedding layer is a real concern, since switching embedding models means re-indexing your entire document corpus.

Open-Weight Models Are Production-Ready

NVIDIA's Llama-Embed-Nemotron-8B leads the multilingual MTEB, and NV-Embed-v2 holds a 72.31 average on the English benchmark. Both are open-weight. Qwen3-Embedding-8B scores 70.58 on the multilingual leaderboard and supports flexible dimensions from 32 to 4096. BGE-en-ICL from BAAI hits 71.24 with in-context learning capabilities that let you boost performance on specific tasks by providing a few examples in the query.

These aren't research toys anymore. All of them run well on a single A100 or even consumer GPUs with quantization. If you have the infrastructure, self-hosting one of these models eliminates per-token costs completely and gives you full control over latency and data privacy.

The API Pricing Spread Is Enormous

The cost difference between embedding providers spans more than 6x at the high end. OpenAI's text-embedding-3-small costs $0.02 per million tokens. Their own text-embedding-3-large costs $0.13 - a 6.5x markup for a model that scores only 2.34 points higher on MTEB. Cohere Embed v4 sits at $0.12 with multimodal support (text and images). Voyage-3-large costs $0.06 and offers 200 million free tokens per account.

For a corpus of 1 million documents, the total embedding cost ranges from roughly $20 (OpenAI small) to $130 (OpenAI large) for the initial indexing pass alone. At scale, these differences compound quickly.

Dimensions and Matryoshka Learning Matter

Most modern embedding models now support Matryoshka Representation Learning, which lets you truncate vectors to smaller dimensions without retraining. This is a real cost savings at the vector database layer - storing 768-dimensional vectors costs 75% less than storing 3072-dimensional ones, and similarity search runs faster on shorter vectors.

Gemini Embedding 001 supports 768, 1536, or 3072 dimensions. Qwen3-Embedding-8B goes all the way down to 32 dimensions. Cohere Embed v4 offers 256, 512, 1024, and 1536. If your use case doesn't demand peak accuracy, dropping to lower dimensions is often the best first optimization.

Cohere's Multimodal Edge

Cohere Embed v4 is the only major commercial embedding model that handles both text and images natively in the same vector space. If you're building a system that needs to search across PDFs with charts, product images alongside descriptions, or mixed-media knowledge bases, Embed v4 is currently the only option that doesn't require separate models for text and image embeddings. It supports over 100 languages and offers binary quantization for further compression.

At 65.20 on MTEB for text, it trails the leaders. But if multimodal is a requirement, it's the clear pick.

Practical Guidance

For standard RAG pipelines: Gemini Embedding 001 offers the best retrieval scores among API models. If Google lock-in isn't acceptable, Voyage-3-large at $0.06/M tokens delivers strong quality at half the cost of OpenAI's large model.

For budget-sensitive applications: OpenAI text-embedding-3-small at $0.02/M tokens is hard to beat on price-to-performance. It scores 62.26 on MTEB - not spectacular, but enough for many production search and retrieval use cases.

For self-hosting: Qwen3-Embedding-8B or NV-Embed-v2 are the top choices. Qwen3 is especially flexible with its 32-4096 dimension range and 32K context window. Both are truly competitive with the best commercial APIs.

For multilingual applications: NVIDIA's Llama-Embed-Nemotron-8B ranks first on the multilingual MTEB across 250+ languages. It's open-weight and free. For a commercial API option, Cohere Embed v4 covers 100+ languages.

For code search: Qwen3-Embedding-8B scores 80.68 on the MTEB Code benchmark, making it the strongest option for code-related retrieval tasks.

For agent frameworks and tool-augmented systems: embedding quality directly affects how well agents retrieve context from knowledge bases. When choosing an LLM for your stack, don't overlook the embedding model - it's often the bottleneck.

Methodology Note

Rankings are based on publicly reported MTEB scores as of March 2026. The English MTEB benchmark covers 56 datasets across 8 task categories. The multilingual MTEB (MMTEB) covers 131 tasks across 250+ languages. Models marked with asterisks use different leaderboard versions, and direct score comparisons across leaderboard versions should be treated with caution. Pricing is based on published API rates and may vary by provider tier or volume.

Embedding Model Leaderboard: MTEB Rankings March 2026

What MTEB Actually Measures

Rankings: March 2026

Key Takeaways

Google's Quiet Dominance

Open-Weight Models Are Production-Ready

The API Pricing Spread Is Enormous

Dimensions and Matryoshka Learning Matter

Cohere's Multimodal Edge

Practical Guidance

Methodology Note

Sources

What MTEB Actually Measures

Rankings: March 2026

Key Takeaways

Google's Quiet Dominance

Open-Weight Models Are Production-Ready

The API Pricing Spread Is Enormous

Dimensions and Matryoshka Learning Matter

Cohere's Multimodal Edge

Practical Guidance

Methodology Note

Sources

Google Analytics