GLM-5.2 Ships MIT-Licensed, 1M Context, Zero Benchmarks
Zhipu AI's GLM-5.2 ships with 1M token context, 744B MoE parameters, and MIT license the day after Fable 5 goes offline - but no benchmark numbers at launch.

GLM-5.2 landed on Z.ai's Coding Plan on June 13 - exactly 24 hours after the US government ordered Fable 5 and Mythos 5 offline. Zhipu AI's line on release was direct: "frontier intelligence belongs to everyone." The model ships with a 1-million-token context window, an MIT license, and no vendor-published benchmark numbers. A full spec breakdown is available in our GLM-5.2 model profile.
That last part deserves attention before anything else.
Key Specs
| Spec | Value |
|---|---|
| Total Parameters | 744B |
| Active per Token | ~40B (MoE) |
| Experts | 384 |
| Context Window | 1,000,000 tokens |
| Max Output | 131,072 tokens |
| Pretraining Tokens | 28.5T |
| License | MIT |
| Release Date | June 13, 2026 |
| Pricing | Z.ai Coding Plan ($10-80/month) |
Under the Hood
Scale and MoE Design
GLM-5.2 runs 744 billion total parameters but activates only around 40 billion per forward pass, using a mixture-of-experts architecture with 384 expert networks. The design follows the path GLM-5.1 established: a large total parameter pool, sparse activation to keep inference affordable, and routing logic that selects experts per token at runtime.
Zhipu hasn't published routing details, head dimensions, or layer counts for the new release, so the architectural comparison to its predecessor stays at the top level. The active-parameter count matches GLM-5.1's 40B figure. What changed isn't the core model capacity but the context engine.
Context Window Engineering
Moving from 200K tokens to 1M tokens usably is harder than it sounds. Standard attention scales quadratically with sequence length, so a 5x context increase would mean a 25x compute hit at inference without architectural intervention. GLM-5.2 uses DeepSeek Sparse Attention to break that scaling curve, attending only to relevant subsets of the sequence rather than the full context at every layer.
Zhipu's documentation describes the result as "truly usable" long context with "high retrieval accuracy and coherence over long sequences." That language signals they're aware of the failure mode every long-context model deals with: the middle of the window quietly becoming invisible to the model. Independent retrieval testing at this scale isn't available yet.
The flagship glm-5.2[1m] variant caps output at 131,072 tokens - enough to refactor a mid-sized codebase in a single session without re-fetching files, which is the concrete developer workflow Zhipu is targeting.
Thinking Modes
Two presets: High and Max. Zhipu recommends Max as default for coding work. Max thinking adds 30-80% to first-token latency while roughly halving throughput on extended runs, a tradeoff any developer running long agentic loops will notice. High mode is faster but less thorough. Neither mode has published numbers showing which benchmark dimensions actually benefit from which setting.
What's Shipping Today
GLM-5.2 is accessible now through Z.ai's Coding Plan at three tiers. It supports Claude Code, Cline, OpenCode, Roo Code, Goose, Crush, OpenClaw, and Kilo Code through OpenAI-compatible endpoints, so most developer setups can drop it in without changing any tooling code.
Zhipu AI, now trading as Z.ai, listed on the Hong Kong Stock Exchange in January 2026. Its stock has risen nearly 820% since the IPO.
Source: commons.wikimedia.org
Standalone API access is delayed until the weight release, targeting mid-June 2026. Until then, developers go through the subscription model. MIT-licensed open weights are promised for Hugging Face under the zai-org account - but they weren't there on day one.
The pricing is deliberately aggressive. Z.ai positions GLM-5.2 at roughly one-tenth the cost of Claude Max and Claude Code subscription tiers. With Fable 5 offline and Anthropic in active negotiations with the Commerce Department over tiered access terms, that price gap carries more weight now than it'd have a week ago.
The Benchmark Gap
This is where the story gets complicated.
| Benchmark | GLM-5.2 | GLM-5.1 | Kimi K2.7-Code | Notes |
|---|---|---|---|---|
| SWE-Bench Pro | Not published | 58.4 | ~60.5 | GLM-5.1 was open-weight SOTA at release |
| Terminal-Bench 2.0 | Not published | 63.5 | - | |
| MCP-Atlas | Not published | 71.8 | - | |
| BridgeBench Reasoning | 42.8* | - | - | *Community report; not verified by Zhipu |
Zhipu shipped GLM-5.2 with zero first-party benchmark numbers. No SWE-Bench Verified, no LiveCodeBench, no HumanEval. The one performance figure circulating in developer communities - 42.8 on BridgeBench Reasoning, allegedly ahead of Fable 5 - doesn't appear in Zhipu's own documentation and hasn't been independently reproduced.
The predecessor GLM-5.1 launched with a 58.4 SWE-Bench Pro score that earned it the open-weight SOTA title at the time. Kimi K2.7-Code has since pushed that frontier further. GLM-5.2 might be ahead of both, or it might not. From first-party data today, there's no way to tell.
Developer impressions shared in the first 48 hours are cautiously positive. GLM-5.2 handles whole-repository tasks without losing thread, and the million-token context holds up better than expected on long agentic sequences. Community sentiment isn't a benchmark, though - and MiniMax M3's launch showed that early enthusiasm doesn't always survive independent evaluation.
GLM-5.2 supports all major AI coding tools via OpenAI-compatible endpoints, with no tooling changes required for most developer setups.
Source: unsplash.com
What To Watch
Open weights release - Zhipu committed to publishing weights on Hugging Face by mid-June 2026. If that date slips, the story shifts from "MIT open-source release" to "closed-access model with a license promise." MiniMax M3 ran into a similar delay on weight publication. Developers have learned to treat "weights coming in ten days" as a claim to verify.
Independent benchmarks - The community will run SWE-Bench and LiveCodeBench evals within days of weight availability. If GLM-5.2's coding performance holds up against the 58.4 floor GLM-5.1 established, the million-token context upgrade is a real advance. If scores regress while context expanded, Zhipu traded depth for length.
Pricing sustainability - The current Coding Plan tiers at $10-80/month for 1M-context inference aren't obviously sustainable at scale. GLM-5.1 was priced at $0.95/M input tokens via the standard API, already below most Western providers. At a fraction of that through subscription tiers, the economics depend on how Zhipu is absorbing compute costs - and for how long.
The geopolitical shelf life - GLM-5.2 is a direct beneficiary of the regulatory disruption that pulled Fable 5 offline. That disruption isn't permanent. When Fable 5 returns, the demand signal that drove rapid GLM-5.2 adoption will partly unwind. The long-term question is whether Zhipu can build a developer base on the model's merits once the alternatives come back online.
Sources:
- Z.ai launches GLM-5.2 - MarkTechPost
- GLM 5.2 Release: 1M Context, Coding-First - Codersera
- GLM-5.2 Lands on Z.ai's Coding Plan - Digital Applied
- Zhipu AI stock rockets after GLM-5.2 open-source - South China Morning Post
- GLM-5.2: China's Open Frontier Model vs Anthropic Ban - Kunal Ganglani
- Zhipu AI Open-Sources GLM-5.2 - Pandaily
