Embedding Model Leaderboard: MTEB Rankings April 2026
April 2026 rankings of the top embedding models by MTEB score - Gemini Embedding 001, NV-Embed-v2, Qwen3-Embedding-8B, and the new Jina v4 multimodal release compared for RAG and search.

Embedding models don't get the same attention as chatbots or reasoning engines, but they're the backbone of every retrieval-augmented generation pipeline, every semantic search system, and most production AI applications that need to find relevant information quickly. Pick the wrong embedding model and your RAG app returns garbage. Pick an expensive one and your costs scale linearly with every document you index.
The Massive Text Embedding Benchmark (MTEB) is still the standard way to compare these models. Compared to our March 2026 rankings, the top of the English leaderboard has barely moved - Gemini Embedding 001 still holds the #1 spot at 68.32. What has changed sits in the middle of the table: Jina released v4 with native multimodal support, Voyage shipped a 3.1 update that narrowed its gap with Gemini, and the open-weight NVIDIA and Qwen releases kept their standing while community serving infrastructure (BGE-M3 endpoints, vLLM embedding support) matured enough to make self-hosting a serious default.
TL;DR - April 2026 refresh
- Gemini Embedding 001 still leads the English MTEB leaderboard at 68.32 (unchanged from March)
- Jina Embeddings v4 landed in April with native multimodal (text + image) support, unlocking a second credible commercial option alongside Cohere Embed v4
- Voyage-3.1 reduced the price-to-quality gap for teams that need an API alternative to Gemini
- Self-hosting Qwen3-Embedding-8B is now the default cost path for teams with GPU capacity - vLLM and SGLang both shipped first-class embedding support in Q1 2026
What MTEB Actually Measures
MTEB evaluates embedding models across eight task categories - retrieval, semantic textual similarity, classification, clustering, pair classification, reranking, bitext mining, and summarization. The English MTEB leaderboard covers 56 datasets across these categories. The multilingual MMTEB spans 131 tasks across 250+ languages and uses Borda count for aggregation, which rewards consistency across tasks rather than dominance on a few.
One caveat unchanged from last month: MTEB scores are self-reported. Model providers submit their own results, and while the evaluation code is open source, there's no independent verification step. Treat vendor press releases accordingly.
Rankings: April 2026
| Rank | Model | Provider | MTEB Avg | Retrieval | Dims | Max Tokens | Pricing (per 1M tokens) |
|---|---|---|---|---|---|---|---|
| 1 | Gemini Embedding 001 | 68.32 | 67.71 | 3072 | 8192 | ~$0.004/1K chars | |
| 2 | NV-Embed-v2 | NVIDIA | 72.31* | 62.65 | 4096 | 32768 | Free (open-weight) |
| 3 | Qwen3-Embedding-8B | Qwen/Alibaba | 70.58** | - | 4096 | 32768 | Free (open-weight) |
| 4 | BGE-en-ICL | BAAI | 71.24* | - | 4096 | 32768 | Free (open-weight) |
| 5 | GTE-Qwen2-7B-instruct | Alibaba | 70.24* | - | 3584 | 32768 | Free (open-weight) |
| 6 | Voyage-3.1-large | Voyage AI | 67.40 | - | 2048 | 32768 | $0.05 |
| 7 | Jina Embeddings v4 | Jina AI | 66.81 | - | 1024 | 32768 | Free tier + paid |
| 8 | Voyage-3-large | Voyage AI | 66.80 | - | 2048 | 32768 | $0.06 |
| 9 | Cohere Embed v4 | Cohere | 65.20 | - | 1024 | 512 | $0.12 |
| 10 | text-embedding-3-large | OpenAI | 64.60 | - | 3072 | 8191 | $0.13 |
| 11 | BGE-M3 | BAAI | 63.00 | - | 1024 | 8192 | Free (open-weight) |
| 12 | Nomic Embed v1.5 | Nomic AI | 62.39 | - | 768 | 8192 | $0.05 |
| 13 | text-embedding-3-small | OpenAI | 62.26 | - | 1536 | 8191 | $0.02 |
*NV-Embed-v2 and BGE-en-ICL scores from legacy MTEB (56 tasks). **Qwen3-Embedding-8B score from MMTEB multilingual leaderboard. Cross-leaderboard comparisons should be treated as approximate.
The table looks broadly similar to March, and that's the point. Embedding-model benchmarks move on a different cadence than chatbot leaderboards. Major MTEB releases are quarterly events, not weekly ones. What did change:
- Voyage-3.1-large is a meaningful update: +0.6 on MTEB average over 3-large, $0.01 per-million-token reduction in price, and improved retrieval on long documents. It's now the highest-quality API alternative to Gemini Embedding.
- Jina Embeddings v4 landed in April with native multimodal support across text and image in a single vector space, echoing Cohere's approach but with a much larger context window (32K vs Cohere's 512). For mixed-media retrieval, this is the new "best if you're not already on Cohere" option.
Key Takeaways
The Top Tier Is Stable, The Middle Shifted
Gemini Embedding 001's +5.09 lead over the next English-MTEB competitor from March has narrowed slightly with Voyage-3.1 moving into the high 67s, but Gemini is still clearly #1 on pure retrieval scores. For production RAG workloads where retrieval quality dominates cost, Gemini remains the default pick.
The more interesting motion happened one tier down. Voyage-3.1 and Jina v4 are both credible alternatives to Gemini for teams that want a non-Google API, which matters for multi-cloud deployments and for organizations with specific data-residency requirements Google doesn't meet.
Self-Hosting Is Now The Default Cost Path
vLLM 0.19.0 and SGLang 0.5.10 both shipped first-class embedding endpoints in Q1 2026. Deploying Qwen3-Embedding-8B on a single A100 now produces batched throughput on par with commercial APIs at roughly 1/20th the cost per million tokens once amortised. For any team already running inference infrastructure for chat models, adding an embedding model to the same GPU pool is straightforward. The per-token economics of commercial embedding APIs are increasingly hard to justify above ~100M tokens/month of indexing traffic.
Matryoshka Is Now Table Stakes
Matryoshka Representation Learning (which lets you truncate embeddings to smaller dimensions without retraining) is now supported by every major model in this leaderboard. Gemini supports 768/1536/3072, Qwen3-Embedding-8B goes as low as 32 dims, Voyage supports 256-2048, Cohere offers 256/512/1024/1536, and Jina v4 ships with 64-1024.
The practical consequence: pick whichever model fits your quality budget, then truncate to the smallest dimension your retrieval quality still tolerates. A 768-dim index costs 25% of what a 3072-dim index costs in vector-DB storage and search latency, and the MRL truncation quality loss is usually under 1% on standard retrieval benchmarks.
The Multimodal Category Has a Second Option
Cohere Embed v4 was the only commercial embedding model with native text-and-image support at the start of 2026. The April release of Jina Embeddings v4 broke that monopoly. Jina v4's main advantages are a 32K max-token input (versus Cohere's 512) and a free tier that lets you evaluate it before committing. Cohere still has the edge on language coverage (100+ languages with production-grade performance).
For teams evaluating multimodal embedding today: Cohere v4 if you need the most languages and don't mind the 512-token cap; Jina v4 if you need long-document multimodal and want to try before you buy.
Pricing Spread Narrowed, But Only Slightly
The cheapest API option (OpenAI text-embedding-3-small, $0.02/M tokens) still costs 6.5x less than the most expensive competitive option (OpenAI text-embedding-3-large, $0.13/M tokens). Voyage's 3.1 pricing of $0.05 is the most interesting middle-tier move - it's now within reach of the Nomic price point while delivering noticeably higher retrieval quality.
Practical Guidance
For standard RAG pipelines: Gemini Embedding 001 is still the best retrieval quality among API models. If you can't use Google (vendor lock-in concerns, procurement, data residency), Voyage-3.1-large at $0.05/M tokens is now the closest substitute.
For budget-sensitive applications: OpenAI text-embedding-3-small at $0.02/M tokens remains the cheapest defensible choice. For teams with GPU capacity, self-hosting Qwen3-Embedding-8B is typically cheaper once indexing volumes exceed ~100M tokens/month.
For self-hosting: Qwen3-Embedding-8B is the first-choice open-weight model. It now has first-class vLLM and SGLang support, flexible dimensions from 32 to 4096, and a 32K context window. NV-Embed-v2 remains a strong alternative if you want the highest raw MTEB-legacy score.
For multilingual applications: NVIDIA's Llama-Embed-Nemotron-8B still leads the multilingual MMTEB. For a commercial API, Cohere Embed v4 covers 100+ languages.
For multimodal (text + images): Cohere Embed v4 or Jina Embeddings v4. Pick Jina for long-document multimodal (32K context), Cohere for maximum language coverage.
For code search: Qwen3-Embedding-8B scores 80.68 on MTEB Code - still the strongest option for code-related retrieval.
For agent frameworks and tool-augmented systems: embedding quality directly affects how well agents retrieve context from knowledge bases. When choosing an LLM for your stack, don't overlook the embedding model - it's often the bottleneck.
What Changed Since March 2026
- Voyage-3.1-large (April) narrowed the price-to-quality gap against Gemini Embedding; new de facto "best non-Google API" pick
- Jina Embeddings v4 (April) broke Cohere's monopoly on production-grade multimodal embeddings
- vLLM 0.19.0 / SGLang 0.5.10 (both Q1 2026) shipped native embedding endpoints, materially lowering the barrier to self-hosting Qwen3-Embedding-8B or BGE-M3
Methodology Note
Rankings are based on publicly reported MTEB scores as of April 2026. The English MTEB benchmark covers 56 datasets across 8 task categories. The multilingual MTEB (MMTEB) covers 131 tasks across 250+ languages. Models marked with asterisks use different leaderboard versions, and direct score comparisons across leaderboard versions should be treated with caution. Pricing reflects published API rates as of 22 April 2026 and may vary by provider tier or volume.
Sources
- MTEB Leaderboard - Hugging Face
- Gemini Embedding 001 - Google Developers Blog
- NV-Embed-v2 - Hugging Face
- Qwen3 Embedding - Qwen Blog
- Voyage-3-large Announcement
- Jina Embeddings v3
- Cohere Embed v4 Changelog
- OpenAI Embedding Models
- Llama-Embed-Nemotron-8B - Hugging Face Blog
- Voyage AI Pricing
- OpenAI API Pricing
✓ Last verified April 23, 2026
