Z.ai released GLM-5.2 on June 13, 2026 - one day after the US Commerce Department suspended global access to Claude Fable 5. The timing was deliberate. Jie Tang, Zhipu's founder, opened his launch post with: "the sudden restriction of certain frontier models is deeply regrettable." The model ships under MIT license with open weights and day-one API support for eight coding agents, including Claude Code, Cline, and Roo Code.

TL;DR

744B total parameters, 40B active per token (MoE), 1M token context window - up from 200K in GLM-5.1
Priced at roughly 1/10th the cost of comparable US frontier tiers; included in GLM Coding Plan ($10-$80/month) at no extra charge
Claims BridgeBench Reasoning #1 at 42.8, beating Fable 5 - but Z.ai published no independent benchmark numbers at launch

Overview

GLM-5.2 is the third major release in Z.ai's GLM-5 series, which began in early 2026 as a 744-billion-parameter Mixture-of-Experts model trained completely on Huawei Ascend 910B chips using the MindSpore framework - no NVIDIA hardware. The base architecture carries over from GLM-5 and GLM-5.1: 384 experts with roughly 40B active per token, DeepSeek Sparse Attention for efficient long-context inference, and training on 28.5 trillion tokens.

The headline change in 5.2 is context. The window jumps from 200K (GLM-5.1) to 1 million tokens, with outputs capped at 131,072 tokens per response. Z.ai also added two thinking-effort modes - High and Max - replacing the single reasoning setting from the previous version. The company recommends Max for multi-step coding tasks.

The release landed during an acute access crisis for developers outside the US. Within hours of Fable 5's suspension, BridgeMind posted that GLM-5.2 had taken the top spot on BridgeBench Reasoning at 42.8, ahead of Fable 5, at 300 tokens per second and roughly 1/10th the subscription cost. Z.ai's stock (Knowledge Atlas Technology, HKEX: 2513) climbed 32.8% on the day, capping a 820% rise since the company's January IPO.

Key Specifications

Specification	Details
Provider	Zhipu AI (Z.ai)
Model Family	GLM
Parameters	744B total, ~40B active per token
Expert Count	384 (MoE)
Context Window	1,000,000 tokens
Max Output	131,072 tokens
Training Data	28.5T tokens
Attention	DeepSeek Sparse Attention
Training Hardware	Huawei Ascend 910B (MindSpore framework)
Input Price	Not disclosed (standalone API pending mid-June 2026)
Output Price	Not disclosed (standalone API pending mid-June 2026)
Release Date	2026-06-13
License	MIT

Benchmark Performance

Z.ai published no official benchmark results at launch. The company has said numbers are coming with the standalone API rollout, expected in the week of June 16. What's confirmed is the BridgeBench Reasoning score, which comes from a third party.

Benchmark	GLM-5.2	GLM-5.1	Notes
BridgeBench Reasoning	42.8	Not measured	#1 ranking per BridgeMind; beats Fable 5
SWE-Bench Pro	Not published	58.4	GLM-5.1 topped the leaderboard at launch
MCP-Atlas	Not published	71.8	Multi-step tool invocation benchmark
Code Arena Elo	Not published	1530 (#3 globally)	At GLM-5.1 launch

The predecessor's numbers are the best available baseline. GLM-5.1's 58.4 on SWE-Bench Pro cleared GPT-5.4 (57.7) and Claude Opus 4.6 (57.3) at the time - see our SWE-Bench coding agent leaderboard for current standings. Whether 5.2 moves that needle at all is an open question. The 1M context upgrade and dual thinking modes are architectural changes, but Z.ai hasn't claimed a coding score improvement. Hold judgment on the BridgeBench number too: the benchmark measures reasoning breadth across 30 tasks and correlates with agent performance, but it isn't a substitute for SWE-Bench or LiveCodeBench on hard code generation.

"You cannot export control your way out of an open source." - BridgeMind, June 13, 2026

Key Capabilities

The 1M context window is the most concrete upgrade. For agentic coding work - long repo ingests, multi-file refactors, marathon sessions - this matters. The previous 200K limit meant agents had to compact frequently. At 1M tokens with CLAUDE_CODE_AUTO_COMPACT_WINDOW set to match, a GLM-5.2-backed Claude Code session can hold substantially more context before hitting the wall.

Z.ai explicitly targets the coding agent use case. The two thinking-effort modes (High and Max) let developers tune cost against depth. Max is the default recommendation; High presumably runs faster and cheaper. Neither the latency difference nor the cost ratio between modes has been published.

The model also ships as the "core engine" for ZCode 3.0, Z.ai's coding product line. Day-one API support covers Claude Code, Cline, OpenCode, Roo Code, Goose, Crush, OpenClaw, and Kilo Code through an OpenAI-compatible endpoint. The model ID is glm-5.2[1m]. For Claude Code specifically, the config maps both Opus and Sonnet slots to the same GLM-5.2 endpoint while dropping Haiku to GLM-4.5-air.

Training on Huawei Ascend hardware continues to be a differentiator for the geopolitics of AI. Every generation of GLM-5 has been trained without a single NVIDIA chip. That matters for supply-chain independence - and for developers in jurisdictions where US export controls may eventually restrict access to US-trained model weights.

Pricing and Availability

GLM-5.2 is available now to all GLM Coding Plan subscribers at no extra charge:

Tier	Monthly Price	Prompt Quota
Lite	~$10/month	~400 prompts/week
Pro	~$30/month	~2,000 prompts/week
Max	~$80/month	~8,000 prompts/week
Team	Seat-based	Negotiated

Standalone API access (token-based pricing) is scheduled for mid-June 2026, alongside the open-weight release on the zai-org HuggingFace account under MIT license. Until those land, the only way to run GLM-5.2 is through the Z.ai Coding Plan or via the eight supported coding agents.

GLM-5 on OpenRouter currently prices at $0.60/M input and $1.92/M output tokens, which represents the Z.ai API tier from the previous generation. GLM-5.2's standalone pricing hasn't been announced, but the company has positioned it at "roughly 1/10th" of Claude Max and equivalent Anthropic tiers.

For comparison, Claude Max runs at the top subscription tier (around $200/month). The GLM Coding Plan Max tier at $80/month with ~8,000 prompts per week works out to about $0.01 per prompt if you hit the ceiling - competitive for high-volume agent workloads.

Strengths and Weaknesses

Strengths

1M token context window is a real architectural step up, directly useful for long agent sessions
MIT license with full open weights planned - the most permissive license in the GLM series history
Day-one integration for eight major coding agents removes setup friction
Trained entirely on domestic hardware - immune to NVIDIA-based export control scenarios
Price per prompt is substantially lower than comparable US frontier subscriptions
300 tokens/second inference speed is fast for a 744B model class

Weaknesses

No official benchmark numbers published at launch - BridgeBench is a third-party claim only
Standalone API and open weights were not available at launch; both pending mid-June rollout
Thinking-effort modes (High vs Max) have no published latency or cost breakdown
Prompt-quota pricing on the Coding Plan makes costs unpredictable for high-volume workloads
No published improvement in coding-specific benchmarks over GLM-5.1