Embedding models don't get the same attention as chatbots or reasoning engines, but they're the backbone of every retrieval-augmented generation pipeline, every semantic search system, and most production AI applications that need to find relevant information quickly. Pick the wrong embedding model and your RAG app returns garbage. Pick an expensive one and your costs scale linearly with every document you index.

The Massive Text Embedding Benchmark (MTEB) is still the standard way to compare these models. Compared to our March 2026 rankings, the top of the English leaderboard has barely moved - Gemini Embedding 001 still holds the #1 spot at 68.32. What has changed sits in the middle of the table: Jina released v4 with native multimodal support, Voyage shipped a 3.1 update that narrowed its gap with Gemini, and the open-weight NVIDIA and Qwen releases kept their standing while community serving infrastructure (BGE-M3 endpoints, vLLM embedding support) matured enough to make self-hosting a serious default.

TL;DR - April 2026 refresh

Gemini Embedding 001 still leads the English MTEB leaderboard at 68.32 (unchanged from March)
Jina Embeddings v4 landed in April with native multimodal (text + image) support, unlocking a second credible commercial option alongside Cohere Embed v4
Voyage-3.1 reduced the price-to-quality gap for teams that need an API alternative to Gemini
Self-hosting Qwen3-Embedding-8B is now the default cost path for teams with GPU capacity - vLLM and SGLang both shipped first-class embedding support in Q1 2026

What MTEB Actually Measures

MTEB evaluates embedding models across eight task categories - retrieval, semantic textual similarity, classification, clustering, pair classification, reranking, bitext mining, and summarization. The English MTEB leaderboard covers 56 datasets across these categories. The multilingual MMTEB spans 131 tasks across 250+ languages and uses Borda count for aggregation, which rewards consistency across tasks rather than dominance on a few.

One caveat unchanged from last month: MTEB scores are self-reported. Model providers submit their own results, and while the evaluation code is open source, there's no independent verification step. Treat vendor press releases accordingly.

Rankings: April 2026

Rank	Model	Provider	MTEB Avg	Retrieval	Dims	Max Tokens	Pricing (per 1M tokens)
1	Gemini Embedding 001	Google	68.32	67.71	3072	8192	~$0.004/1K chars
2	NV-Embed-v2	NVIDIA	72.31*	62.65	4096	32768	Free (open-weight)
3	Qwen3-Embedding-8B	Qwen/Alibaba	70.58**	-	4096	32768	Free (open-weight)
4	BGE-en-ICL	BAAI	71.24*	-	4096	32768	Free (open-weight)
5	GTE-Qwen2-7B-instruct	Alibaba	70.24*	-	3584	32768	Free (open-weight)
6	Voyage-3.1-large	Voyage AI	67.40	-	2048	32768	$0.05
7	Jina Embeddings v4	Jina AI	66.81	-	1024	32768	Free tier + paid
8	Voyage-3-large	Voyage AI	66.80	-	2048	32768	$0.06
9	Cohere Embed v4	Cohere	65.20	-	1024	512	$0.12
10	text-embedding-3-large	OpenAI	64.60	-	3072	8191	$0.13
11	BGE-M3	BAAI	63.00	-	1024	8192	Free (open-weight)
12	Nomic Embed v1.5	Nomic AI	62.39	-	768	8192	$0.05
13	text-embedding-3-small	OpenAI	62.26	-	1536	8191	$0.02

*NV-Embed-v2 and BGE-en-ICL scores from legacy MTEB (56 tasks). **Qwen3-Embedding-8B score from MMTEB multilingual leaderboard. Cross-leaderboard comparisons should be treated as approximate.

The table looks broadly similar to March, and that's the point. Embedding-model benchmarks move on a different cadence than chatbot leaderboards. Major MTEB releases are quarterly events, not weekly ones. What did change:

Voyage-3.1-large is a meaningful update: +0.6 on MTEB average over 3-large, $0.01 per-million-token reduction in price, and improved retrieval on long documents. It's now the highest-quality API alternative to Gemini Embedding.
Jina Embeddings v4 landed in April with native multimodal support across text and image in a single vector space, echoing Cohere's approach but with a much larger context window (32K vs Cohere's 512). For mixed-media retrieval, this is the new "best if you're not already on Cohere" option.

Key Takeaways

The Top Tier Is Stable, The Middle Shifted

Gemini Embedding 001's +5.09 lead over the next English-MTEB competitor from March has narrowed slightly with Voyage-3.1 moving into the high 67s, but Gemini is still clearly #1 on pure retrieval scores. For production RAG workloads where retrieval quality dominates cost, Gemini remains the default pick.

The more interesting motion happened one tier down. Voyage-3.1 and Jina v4 are both credible alternatives to Gemini for teams that want a non-Google API, which matters for multi-cloud deployments and for organizations with specific data-residency requirements Google doesn't meet.

Self-Hosting Is Now The Default Cost Path

vLLM 0.19.0 and SGLang 0.5.10 both shipped first-class embedding endpoints in Q1 2026. Deploying Qwen3-Embedding-8B on a single A100 now produces batched throughput on par with commercial APIs at roughly 1/20th the cost per million tokens once amortised. For any team already running inference infrastructure for chat models, adding an embedding model to the same GPU pool is straightforward. The per-token economics of commercial embedding APIs are increasingly hard to justify above ~100M tokens/month of indexing traffic.

Matryoshka Is Now Table Stakes

Matryoshka Representation Learning (which lets you truncate embeddings to smaller dimensions without retraining) is now supported by every major model in this leaderboard. Gemini supports 768/1536/3072, Qwen3-Embedding-8B goes as low as 32 dims, Voyage supports 256-2048, Cohere offers 256/512/1024/1536, and Jina v4 ships with 64-1024.

The practical consequence: pick whichever model fits your quality budget, then truncate to the smallest dimension your retrieval quality still tolerates. A 768-dim index costs 25% of what a 3072-dim index costs in vector-DB storage and search latency, and the MRL truncation quality loss is usually under 1% on standard retrieval benchmarks.

The Multimodal Category Has a Second Option

Cohere Embed v4 was the only commercial embedding model with native text-and-image support at the start of 2026. The April release of Jina Embeddings v4 broke that monopoly. Jina v4's main advantages are a 32K max-token input (versus Cohere's 512) and a free tier that lets you evaluate it before committing. Cohere still has the edge on language coverage (100+ languages with production-grade performance).

For teams evaluating multimodal embedding today: Cohere v4 if you need the most languages and don't mind the 512-token cap; Jina v4 if you need long-document multimodal and want to try before you buy.

Pricing Spread Narrowed, But Only Slightly

The cheapest API option (OpenAI text-embedding-3-small, $0.02/M tokens) still costs 6.5x less than the most expensive competitive option (OpenAI text-embedding-3-large, $0.13/M tokens). Voyage's 3.1 pricing of $0.05 is the most interesting middle-tier move - it's now within reach of the Nomic price point while delivering noticeably higher retrieval quality.

Practical Guidance

For standard RAG pipelines: Gemini Embedding 001 is still the best retrieval quality among API models. If you can't use Google (vendor lock-in concerns, procurement, data residency), Voyage-3.1-large at $0.05/M tokens is now the closest substitute.

For budget-sensitive applications: OpenAI text-embedding-3-small at $0.02/M tokens remains the cheapest defensible choice. For teams with GPU capacity, self-hosting Qwen3-Embedding-8B is typically cheaper once indexing volumes exceed ~100M tokens/month.

For self-hosting: Qwen3-Embedding-8B is the first-choice open-weight model. It now has first-class vLLM and SGLang support, flexible dimensions from 32 to 4096, and a 32K context window. NV-Embed-v2 remains a strong alternative if you want the highest raw MTEB-legacy score.

For multilingual applications: NVIDIA's Llama-Embed-Nemotron-8B still leads the multilingual MMTEB. For a commercial API, Cohere Embed v4 covers 100+ languages.

For multimodal (text + images): Cohere Embed v4 or Jina Embeddings v4. Pick Jina for long-document multimodal (32K context), Cohere for maximum language coverage.

For code search: Qwen3-Embedding-8B scores 80.68 on MTEB Code - still the strongest option for code-related retrieval.

For agent frameworks and tool-augmented systems: embedding quality directly affects how well agents retrieve context from knowledge bases. When choosing an LLM for your stack, don't overlook the embedding model - it's often the bottleneck.

What Changed Since March 2026

Voyage-3.1-large (April) narrowed the price-to-quality gap against Gemini Embedding; new de facto "best non-Google API" pick
Jina Embeddings v4 (April) broke Cohere's monopoly on production-grade multimodal embeddings
vLLM 0.19.0 / SGLang 0.5.10 (both Q1 2026) shipped native embedding endpoints, materially lowering the barrier to self-hosting Qwen3-Embedding-8B or BGE-M3

Methodology Note

Rankings are based on publicly reported MTEB scores as of April 2026. The English MTEB benchmark covers 56 datasets across 8 task categories. The multilingual MTEB (MMTEB) covers 131 tasks across 250+ languages. Models marked with asterisks use different leaderboard versions, and direct score comparisons across leaderboard versions should be treated with caution. Pricing reflects published API rates as of 22 April 2026 and may vary by provider tier or volume.

Embedding Model Leaderboard: MTEB Rankings April 2026