Name: ERNIE 5.1
Author: Baidu

Baidu's ERNIE 5.1 dropped on May 9, 2026, and landed at the top of the Chinese model rankings on LMArena within 24 hours. The headline isn't the benchmark position - it's how Baidu got there. The company built ERNIE 5.1 at roughly 6% of the pre-training cost of comparable frontier models, a number that, if independently verified, would mark one of the more significant efficiency gains in recent LLM development.

TL;DR

#1 Chinese model on LMArena Search Arena (score 1,223, #4 globally), beating all other Chinese labs as of May 2026
Text-only MoE with ~800B total parameters and 128K context at $0.59/M input tokens - roughly 25x cheaper than Claude Opus 4.7 on input
Beats DeepSeek V4 Pro on agent benchmarks (τ³-bench, SpreadsheetBench-Verified) but ships without open weights

The model is the direct successor to ERNIE 5.0, but the two are quite different in intent. ERNIE 5.0 was Baidu's multimodal play - 2.4 trillion parameters processing text, images, audio, and video through a single unified architecture. ERNIE 5.1 strips that back to text only, reduces total parameters to roughly one-third, and refocuses on what Baidu's enterprise customers actually use: search, legal analysis, financial tasks, and agentic workflows. Whether that narrowing is a strength or a compromise depends on your workload.

Key Specifications

Specification	Details
Provider	Baidu
Model Family	ERNIE
Parameters	~800B total / ~36B active (estimated)
Context Window	128K tokens input / 65,536 tokens output
Input Price	$0.59/M tokens
Output Price	$2.65/M tokens
Release Date	2026-05-09
License	Proprietary, hosted-only
Access	ernie.baidu.com, Baidu AI Studio, Qianfan API

Benchmark Performance

The published numbers cluster around search, math, and agentic tasks - the domains Baidu is most interested in defending.

Benchmark	ERNIE 5.1	DeepSeek V4-Pro	Gemini 3.1 Pro	Claude Opus 4.7
LMArena Search Arena	1,223 (#4 global)	Not disclosed	Top 5	1,236
LMArena Text Arena	1,476 (#13 global)	Top 15	Top 5	Top 5
AIME26 (with tools)	99.6	Not disclosed	99.9	Not disclosed
SpreadsheetBench-Verified	72.5	Below 72.5	Not disclosed	Not disclosed
τ³-bench	Beats V4-Pro	Reference	Not disclosed	Not disclosed

The Search Arena result is the standout: 4th globally and first among Chinese models with a score of 1,223, behind Claude Opus 4.7 (1,236), GPT-5.5 Search (1,242), and Claude Opus 4.6 Search (1,255). On the LMArena Text Arena, ERNIE 5.1 Preview sits at 1,476 and 13th place overall.

Category rankings on LMArena tell a clearer story than the aggregate position. ERNIE 5.1 is ranked #1 globally in Legal and Government tasks, #4 in Business/Finance, #7 in Software/IT, and #9 in Math. If your workload sits in those verticals - especially legal - this is the cheapest path to frontier-level performance.

One caveat: Baidu doesn't publish open benchmarks on MMLU-Pro or GPQA Diamond with the same specificity as US labs. The AIME26 and arena scores are real and independently verified through LMArena's blind voting system, but the "approaches Gemini 3.1 Pro on GPQA and MMLU-Pro" claim is vendor-reported and can't be cross-checked.

ERNIE 5.1 LMArena Search Arena leaderboard showing #4 global ranking ERNIE 5.1 Search Arena results from the official release announcement, showing 1,223 Elo and #1 Chinese model. Source: felloai.com

Key Capabilities

Search and Retrieval

The Search Arena ranking reflects real capability, not marketing positioning. ERNIE 5.1 performs well on multi-hop retrieval tasks that require pulling information from across a long context and synthesizing it correctly. This is where the 128K window earns its place - the model doesn't just extend context, it uses it. If you're running RAG pipelines or document review workflows, ERNIE 5.1's search-first tuning is worth testing against comparable Western models at 3-4x the price point.

Agentic Tasks

On τ³-bench and SpreadsheetBench-Verified, ERNIE 5.1 surpasses DeepSeek V4 Pro - a meaningful result given DeepSeek V4's strong showing on agent evaluations earlier this year. Baidu credits a disaggregated fully-asynchronous RL framework that decouples the training, inference, reward, and agent loop subsystems. The practical effect is a model that holds up on longer multi-step tasks without losing coherence mid-chain.

This puts ERNIE 5.1 in conversation with the agent-capable frontier. Check our agentic AI benchmarks leaderboard for broader context on where it sits against WebArena and GAIA scores.

Legal and Financial Domains

First-place LMArena rankings in Legal and Government tasks are a meaningful signal for enterprise buyers. Chinese-language legal documents and cross-border compliance queries are exactly the kind of structured, high-stakes text that benefits from a model trained heavily on Chinese regulatory and case law corpora. For non-Chinese legal work, the advantage likely narrows - but the #4 Business/Finance ranking suggests the domain strength isn't purely language-specific.

Pricing and Availability

ERNIE 5.1 is available through three channels: the ernie.baidu.com chat interface, Baidu AI Studio's Playground, and the Qianfan API with an OpenAI-compatible endpoint. API model ID is ernie-5.1.

At $0.59/M input and $2.65/M output, it's priced well below Western closed models at similar capability levels:

Model	Input	Output
ERNIE 5.1	$0.59/M	$2.65/M
DeepSeek V4-Pro	$0.44/M	$0.87/M
GPT-5.5	~$10.00/M	~$30.00/M
Claude Opus 4.7	$15.00/M	$75.00/M

The catch is infrastructure. All inference runs in Baidu's Chinese data centers, and enterprise access may require account setup through Baidu AI Cloud. For teams with data residency requirements outside China, this isn't usable without a proxy layer. The model also ships without open weights, so on-premises or self-hosted deployment isn't possible.

Our cost efficiency leaderboard has a fuller breakdown of performance-per-dollar comparisons across current models.

Baidu headquarters at Shangdi, Beijing Baidu's main campus in the Shangdi technology district, Beijing, where ERNIE models are developed. Source: commons.wikimedia.org

The Cost Efficiency Story

The 6% training cost claim is the number I'd want third-party verification on before betting a production system on it. Baidu says ERNIE 5.1 uses an "Once-For-All elastic training" method - it extracts an ideal sub-network from ERNIE 5.0's multi-dimensional elastic architecture, which apparently lets the team inherit ERNIE 5.0's knowledge without repeating the full pre-training compute budget. The approach is technically plausible - similar ideas have been explored in academic work on elastic networks and knowledge distillation. But the 94% cost reduction figure has no independent audit, and Baidu's incentive to make this number sound large is obvious.

What we can verify: the model performs well at its price point, it's smaller than ERNIE 5.0 by roughly two-thirds in total parameters, and the LMArena search results are independently scored. That's enough to take the model seriously. Whether the cost efficiency is reproducible at scale outside Baidu's infrastructure is a different question.

Built at 6% of comparable training costs - the number Baidu wants you to remember, and the one I'd want independent verification on before treating it as a methodology, not just a marketing figure.

Strengths and Weaknesses

Strengths

#1 globally in Legal/Government tasks on LMArena - truly useful for high-stakes text work in those domains
Strong cost/performance ratio at $0.59/M input (better than comparable Western models by 10-25x)
Solid agentic performance, beating DeepSeek V4-Pro on τ³-bench and SpreadsheetBench-Verified
99.6 on AIME26 with tools shows real math reasoning capability
128K context with strong retrieval quality, not just raw window size

Weaknesses

Text-only at launch - a step back from ERNIE 5.0's multimodal design, and most frontier competitors ship vision as table stakes
Closed weights, hosted-only - no on-premises option for data-sensitive workloads
All inference in Chinese data centers, which creates compliance and latency issues for non-China deployments
Key benchmarks (GPQA Diamond, MMLU-Pro) not published independently - you're taking Baidu's word on broad capability claims
Account verification may require mainland Chinese credentials for some API access tiers

ERNIE 5.0 - predecessor model with multimodal capabilities and 2.4T parameters
DeepSeek V4 - primary competitor on agentic benchmarks
Legal AI LLM Leaderboard - where ERNIE 5.1 claims #1 globally
Cost Efficiency Leaderboard - full performance-per-dollar comparison
LLM Rankings June 2026 - overall market context and model positioning
Agentic AI Benchmarks Leaderboard - τ³-bench and GAIA rankings

FAQ

Is ERNIE 5.1 the best Chinese AI model available?

As of May 2026, yes - it holds the #1 spot among Chinese models on both LMArena Text and Search arenas. DeepSeek V4-Pro and GLM-5.2 are close competitors depending on the task.

Does ERNIE 5.1 support vision or multimodal inputs?

No. ERNIE 5.1 is text-only at launch, unlike its predecessor ERNIE 5.0 which handled text, images, audio, and video. Baidu has not announced a timeline for multimodal support.

Can I run ERNIE 5.1 on my own hardware?

No. The model weights are closed and Baidu provides only hosted access through ernie.baidu.com, Baidu AI Studio, and the Qianfan API.

How does ERNIE 5.1 pricing compare to GPT-5.5?

ERNIE 5.1 costs $0.59/M input and $2.65/M output. GPT-5.5 runs around $10/M input and $30/M output - roughly 17x more expensive on input and 11x on output.

What is the context window for ERNIE 5.1?

128K tokens for input and 65,536 tokens for output, matching ERNIE 5.0 on context but shorter than the 1M-token windows on DeepSeek V4-Pro and recent Qwen models.

Sources: