China's GLM-5 Rivals GPT-5.2 on Zero Nvidia Silicon

The US chip embargo was supposed to keep China a generation behind in AI. Zhipu AI just shipped a 744-billion-parameter model trained on 100,000 Huawei Ascend 910B chips - zero Nvidia, zero AMD - and it scores within single digits of GPT-5.2 on the benchmarks that matter.

TL;DR

GLM-5 is a 744B MoE model with 44B active parameters, trained on 28.5 trillion tokens using exclusively Huawei Ascend 910B hardware
Scores 77.8% on SWE-bench Verified and 50.4% on Humanity's Last Exam (with tools), beating GPT-5.2 on both
Released under the MIT license - the most permissive open-source license available - and priced 5-6x cheaper than Western frontier models
Zhipu stock surged 28.7% on the Hong Kong exchange following the announcement

The Numbers

GLM-5 doesn't beat every Western model on every test. But it no longer needs to. The gap between a Chinese open-source model and the best closed labs has collapsed to rounding errors on several key benchmarks.

Benchmark	GLM-5	GPT-5.2	Claude Opus 4.5
SWE-bench Verified	77.8%	76.2%	80.9%
Humanity's Last Exam (tools)	50.4%	47.8%	46.2%
BrowseComp	75.9	72.1	68.4
Terminal-Bench 2.0	56.2%	64.7%	65.4%
AIME 2025	88.7%	100%	92.3%
GPQA	68.2%	71.5%	69.8%

On coding tasks tracked by our coding benchmarks leaderboard, GLM-5 sits just behind Claude Opus 4.5 and slightly ahead of GPT-5.2 on SWE-bench Verified. On Humanity's Last Exam - a test designed to remain difficult for frontier models - GLM-5 leads the field outright.

Architecture

GLM-5 uses a Mixture-of-Experts design with 256 total experts and 8 active per token, yielding a 5.9% sparsity rate. Only 44 billion parameters fire on any given inference pass, keeping compute costs manageable despite the model's raw scale. The context window extends to 200,000 tokens with a maximum output of 131,000 tokens.

Two technical choices stand out. Multi-head Latent Attention cuts memory requirements by 33% compared to standard attention. DeepSeek Sparse Attention - borrowed from the open-source playbook that DeepSeek pioneered - enables efficient long-context processing without dense attention overhead.

The Hallucination Problem

Zhipu developed a reinforcement learning framework called Slime specifically to reduce hallucinations. The results are dramatic: GLM-5 reports a 34% hallucination rate, down from 90% on its predecessor GLM-4.7. For comparison, Claude Sonnet 4.5 sits around 42% and GPT-5.2 around 48% on the same evaluation. If these numbers hold under independent testing, GLM-5 would have the lowest hallucination rate of any frontier model currently available.

Server racks in a data center - the kind of infrastructure Zhipu scaled to 100,000 Huawei Ascend chips Zhipu coordinated 100,000 Huawei Ascend 910B processors to train GLM-5 - an unprecedented scale for non-Nvidia hardware.

Built on Huawei, Start to Finish

The headline feature is not the architecture. It's the hardware.

Every parameter in GLM-5 was trained on Huawei Ascend 910B processors running the MindSpore framework. No Nvidia H100s, no AMD MI300X chips, no American silicon anywhere in the training stack. Zhipu also confirmed compatibility with processors from Moore Threads, Cambricon, Kunlun, MetaX, Enflame, and Hygon - all Chinese chipmakers.

The engineering challenges were sizable. Making 100,000 Ascend chips work together reliably enough to complete a training run of 28.5 trillion tokens required Zhipu to develop custom optimization techniques, including dynamic graph multi-level pipelined deployment and high-performance fusion operators built specifically for Ascend's architecture.

"Unlike OpenAI's closed system, we adopt an open strategy to advance science and technology, fostering industry-academia collaboration while focusing on continuously enhancing the capabilities of our strongest foundational model," said CEO Zhang Peng.

The energy cost remains a real constraint. The massive domestic clusters consume significantly more power than equivalent Nvidia-based systems, and Zhipu has acknowledged that breakthroughs in power management remain necessary.

Pricing and Access

GLM-5 undercuts every Western frontier model on price.

	GLM-5	GPT-5.2	Claude Opus 4.6
Input (per 1M tokens)	$1.00	~$6.00	~$6.00
Output (per 1M tokens)	$3.20	~$30.00	~$30.00
License	MIT	Proprietary	Proprietary
Free tier	Yes (chat.z.ai)	Limited	Limited

The model weights are available on HuggingFace under zai-org/GLM-5 and on ModelScope. API access runs through OpenAI-compatible endpoints at Z.ai and OpenRouter, and the model is compatible with vLLM for self-hosting. GLM-5 also builds on the foundation of models like GLM-4.7-Flash, which already demonstrated that Zhipu could deliver competitive performance at the smaller end of the scale.

Inference speed is the tradeoff. GLM-5 produces 17-19 tokens per second compared to 25-30+ for GPT-5.2 and Claude Opus, a gap that reflects both the MoE routing overhead and the Ascend hardware's lower per-chip throughput.

Stock market data on a trading screen Zhipu stock surged 28.7% on the Hong Kong exchange within 24 hours of GLM-5's release.

What It Does Not Tell You

Three caveats deserve attention.

Benchmark Selection

GLM-5 excels at coding and agentic benchmarks - SWE-bench, BrowseComp, Vending Bench - but trails meaningfully on pure reasoning tasks. On AIME 2025, GPT-5.2 scores a perfect 100% while GLM-5 manages 88.7%. On Terminal-Bench 2.0, both GPT-5.2 and Claude Opus hold a roughly 8-9 point advantage. The model's strengths are real, but they're concentrated in specific domains. For a deeper look at how to interpret these scores, see our guide to understanding AI benchmarks.

Independent Verification

Most of the benchmark numbers come from Zhipu's own evaluations. The hallucination reduction claims - a drop from 90% to 34% - are particularly in need of independent validation. Until third-party evaluators like Chatbot Arena and LMSYS confirm these results, treat them as manufacturer-reported specifications rather than established facts.

Compute Is Still Tight

Zhipu has acknowledged that compute constraints are limiting rollout. The model launched first for coding subscribers, with broader availability following gradually. Running 100,000 Ascend chips at scale isn't cheap, and the energy overhead of domestic silicon remains a structural disadvantage that pricing alone doesn't reflect.

GLM-5 does not end the AI race. It does prove that US chip export controls haven't prevented China from reaching frontier-class performance - at least in specific domains - using an entirely domestic hardware stack. The model is open-source, aggressively priced, and good enough on coding and agentic tasks to warrant serious evaluation by anyone currently paying Western frontier rates. The question is no longer whether Chinese labs can compete at the frontier. It's whether the rest of the industry has priced in how fast they're closing the gap.

Sources: