GLM-5.2 Ships MIT-Licensed, 1M Context, Zero Benchmarks

Zhipu AI's GLM-5.2 ships with 1M token context, 744B MoE parameters, and MIT license the day after Fable 5 goes offline - but no benchmark numbers at launch.

GLM-5.2 Ships MIT-Licensed, 1M Context, Zero Benchmarks

GLM-5.2 landed on Z.ai's Coding Plan on June 13 - exactly 24 hours after the US government ordered Fable 5 and Mythos 5 offline. Zhipu AI's line on release was direct: "frontier intelligence belongs to everyone." The model ships with a 1-million-token context window, an MIT license, and no vendor-published benchmark numbers. A full spec breakdown is available in our GLM-5.2 model profile.

That last part deserves attention before anything else.

Key Specs

SpecValue
Total Parameters744B
Active per Token~40B (MoE)
Experts384
Context Window1,000,000 tokens
Max Output131,072 tokens
Pretraining Tokens28.5T
LicenseMIT
Release DateJune 13, 2026
PricingZ.ai Coding Plan ($10-80/month)

Under the Hood

Scale and MoE Design

GLM-5.2 runs 744 billion total parameters but activates only around 40 billion per forward pass, using a mixture-of-experts architecture with 384 expert networks. The design follows the path GLM-5.1 established: a large total parameter pool, sparse activation to keep inference affordable, and routing logic that selects experts per token at runtime.

Zhipu hasn't published routing details, head dimensions, or layer counts for the new release, so the architectural comparison to its predecessor stays at the top level. The active-parameter count matches GLM-5.1's 40B figure. What changed isn't the core model capacity but the context engine.

Context Window Engineering

Moving from 200K tokens to 1M tokens usably is harder than it sounds. Standard attention scales quadratically with sequence length, so a 5x context increase would mean a 25x compute hit at inference without architectural intervention. GLM-5.2 uses DeepSeek Sparse Attention to break that scaling curve, attending only to relevant subsets of the sequence rather than the full context at every layer.

Zhipu's documentation describes the result as "truly usable" long context with "high retrieval accuracy and coherence over long sequences." That language signals they're aware of the failure mode every long-context model deals with: the middle of the window quietly becoming invisible to the model. Independent retrieval testing at this scale isn't available yet.

The flagship glm-5.2[1m] variant caps output at 131,072 tokens - enough to refactor a mid-sized codebase in a single session without re-fetching files, which is the concrete developer workflow Zhipu is targeting.

Thinking Modes

Two presets: High and Max. Zhipu recommends Max as default for coding work. Max thinking adds 30-80% to first-token latency while roughly halving throughput on extended runs, a tradeoff any developer running long agentic loops will notice. High mode is faster but less thorough. Neither mode has published numbers showing which benchmark dimensions actually benefit from which setting.

What's Shipping Today

GLM-5.2 is accessible now through Z.ai's Coding Plan at three tiers. It supports Claude Code, Cline, OpenCode, Roo Code, Goose, Crush, OpenClaw, and Kilo Code through OpenAI-compatible endpoints, so most developer setups can drop it in without changing any tooling code.

Z.ai logo, the company behind GLM-5.2 Zhipu AI, now trading as Z.ai, listed on the Hong Kong Stock Exchange in January 2026. Its stock has risen nearly 820% since the IPO. Source: commons.wikimedia.org

Standalone API access is delayed until the weight release, targeting mid-June 2026. Until then, developers go through the subscription model. MIT-licensed open weights are promised for Hugging Face under the zai-org account - but they weren't there on day one.

The pricing is deliberately aggressive. Z.ai positions GLM-5.2 at roughly one-tenth the cost of Claude Max and Claude Code subscription tiers. With Fable 5 offline and Anthropic in active negotiations with the Commerce Department over tiered access terms, that price gap carries more weight now than it'd have a week ago.

The Benchmark Gap

This is where the story gets complicated.

BenchmarkGLM-5.2GLM-5.1Kimi K2.7-CodeNotes
SWE-Bench ProNot published58.4~60.5GLM-5.1 was open-weight SOTA at release
Terminal-Bench 2.0Not published63.5-
MCP-AtlasNot published71.8-
BridgeBench Reasoning42.8*--*Community report; not verified by Zhipu

Zhipu shipped GLM-5.2 with zero first-party benchmark numbers. No SWE-Bench Verified, no LiveCodeBench, no HumanEval. The one performance figure circulating in developer communities - 42.8 on BridgeBench Reasoning, allegedly ahead of Fable 5 - doesn't appear in Zhipu's own documentation and hasn't been independently reproduced.

The predecessor GLM-5.1 launched with a 58.4 SWE-Bench Pro score that earned it the open-weight SOTA title at the time. Kimi K2.7-Code has since pushed that frontier further. GLM-5.2 might be ahead of both, or it might not. From first-party data today, there's no way to tell.

Developer impressions shared in the first 48 hours are cautiously positive. GLM-5.2 handles whole-repository tasks without losing thread, and the million-token context holds up better than expected on long agentic sequences. Community sentiment isn't a benchmark, though - and MiniMax M3's launch showed that early enthusiasm doesn't always survive independent evaluation.

A developer working with code on a laptop GLM-5.2 supports all major AI coding tools via OpenAI-compatible endpoints, with no tooling changes required for most developer setups. Source: unsplash.com

What To Watch

Open weights release - Zhipu committed to publishing weights on Hugging Face by mid-June 2026. If that date slips, the story shifts from "MIT open-source release" to "closed-access model with a license promise." MiniMax M3 ran into a similar delay on weight publication. Developers have learned to treat "weights coming in ten days" as a claim to verify.

Independent benchmarks - The community will run SWE-Bench and LiveCodeBench evals within days of weight availability. If GLM-5.2's coding performance holds up against the 58.4 floor GLM-5.1 established, the million-token context upgrade is a real advance. If scores regress while context expanded, Zhipu traded depth for length.

Pricing sustainability - The current Coding Plan tiers at $10-80/month for 1M-context inference aren't obviously sustainable at scale. GLM-5.1 was priced at $0.95/M input tokens via the standard API, already below most Western providers. At a fraction of that through subscription tiers, the economics depend on how Zhipu is absorbing compute costs - and for how long.

The geopolitical shelf life - GLM-5.2 is a direct beneficiary of the regulatory disruption that pulled Fable 5 offline. That disruption isn't permanent. When Fable 5 returns, the demand signal that drove rapid GLM-5.2 adoption will partly unwind. The long-term question is whether Zhipu can build a developer base on the model's merits once the alternatives come back online.


Sources:

Sophie Zhang
About the author AI Infrastructure & Open Source Reporter

Sophie is a journalist and former systems engineer who covers AI infrastructure, open-source models, and the developer tooling ecosystem.