GLM-5.2

Z.ai's GLM-5.2 is a 744B open-weight MoE model with a 1M token context window, MIT license, and first-day support for eight coding agents at roughly 1/10th the cost of US frontier models.

GLM-5.2

Z.ai released GLM-5.2 on June 13, 2026 - one day after the US Commerce Department suspended global access to Claude Fable 5. The timing was deliberate. Jie Tang, Zhipu's founder, opened his launch post with: "the sudden restriction of certain frontier models is deeply regrettable." The model ships under MIT license with open weights and day-one API support for eight coding agents, including Claude Code, Cline, and Roo Code.

TL;DR

  • 744B total parameters, 40B active per token (MoE), 1M token context window - up from 200K in GLM-5.1
  • Priced at roughly 1/10th the cost of comparable US frontier tiers; included in GLM Coding Plan ($10-$80/month) at no extra charge
  • Claims BridgeBench Reasoning #1 at 42.8, beating Fable 5 - but Z.ai published no independent benchmark numbers at launch

Overview

GLM-5.2 is the third major release in Z.ai's GLM-5 series, which began in early 2026 as a 744-billion-parameter Mixture-of-Experts model trained completely on Huawei Ascend 910B chips using the MindSpore framework - no NVIDIA hardware. The base architecture carries over from GLM-5 and GLM-5.1: 384 experts with roughly 40B active per token, DeepSeek Sparse Attention for efficient long-context inference, and training on 28.5 trillion tokens.

The headline change in 5.2 is context. The window jumps from 200K (GLM-5.1) to 1 million tokens, with outputs capped at 131,072 tokens per response. Z.ai also added two thinking-effort modes - High and Max - replacing the single reasoning setting from the previous version. The company recommends Max for multi-step coding tasks.

The release landed during an acute access crisis for developers outside the US. Within hours of Fable 5's suspension, BridgeMind posted that GLM-5.2 had taken the top spot on BridgeBench Reasoning at 42.8, ahead of Fable 5, at 300 tokens per second and roughly 1/10th the subscription cost. Z.ai's stock (Knowledge Atlas Technology, HKEX: 2513) climbed 32.8% on the day, capping a 820% rise since the company's January IPO.

Key Specifications

SpecificationDetails
ProviderZhipu AI (Z.ai)
Model FamilyGLM
Parameters744B total, ~40B active per token
Expert Count384 (MoE)
Context Window1,000,000 tokens
Max Output131,072 tokens
Training Data28.5T tokens
AttentionDeepSeek Sparse Attention
Training HardwareHuawei Ascend 910B (MindSpore framework)
Input PriceNot disclosed (standalone API pending mid-June 2026)
Output PriceNot disclosed (standalone API pending mid-June 2026)
Release Date2026-06-13
LicenseMIT

Benchmark Performance

Z.ai published no official benchmark results at launch. The company has said numbers are coming with the standalone API rollout, expected in the week of June 16. What's confirmed is the BridgeBench Reasoning score, which comes from a third party.

BenchmarkGLM-5.2GLM-5.1Notes
BridgeBench Reasoning42.8Not measured#1 ranking per BridgeMind; beats Fable 5
SWE-Bench ProNot published58.4GLM-5.1 topped the leaderboard at launch
MCP-AtlasNot published71.8Multi-step tool invocation benchmark
Code Arena EloNot published1530 (#3 globally)At GLM-5.1 launch

The predecessor's numbers are the best available baseline. GLM-5.1's 58.4 on SWE-Bench Pro cleared GPT-5.4 (57.7) and Claude Opus 4.6 (57.3) at the time - see our SWE-Bench coding agent leaderboard for current standings. Whether 5.2 moves that needle at all is an open question. The 1M context upgrade and dual thinking modes are architectural changes, but Z.ai hasn't claimed a coding score improvement. Hold judgment on the BridgeBench number too: the benchmark measures reasoning breadth across 30 tasks and correlates with agent performance, but it isn't a substitute for SWE-Bench or LiveCodeBench on hard code generation.

"You cannot export control your way out of an open source." - BridgeMind, June 13, 2026

Key Capabilities

The 1M context window is the most concrete upgrade. For agentic coding work - long repo ingests, multi-file refactors, marathon sessions - this matters. The previous 200K limit meant agents had to compact frequently. At 1M tokens with CLAUDE_CODE_AUTO_COMPACT_WINDOW set to match, a GLM-5.2-backed Claude Code session can hold substantially more context before hitting the wall.

Z.ai explicitly targets the coding agent use case. The two thinking-effort modes (High and Max) let developers tune cost against depth. Max is the default recommendation; High presumably runs faster and cheaper. Neither the latency difference nor the cost ratio between modes has been published.

The model also ships as the "core engine" for ZCode 3.0, Z.ai's coding product line. Day-one API support covers Claude Code, Cline, OpenCode, Roo Code, Goose, Crush, OpenClaw, and Kilo Code through an OpenAI-compatible endpoint. The model ID is glm-5.2[1m]. For Claude Code specifically, the config maps both Opus and Sonnet slots to the same GLM-5.2 endpoint while dropping Haiku to GLM-4.5-air.

Training on Huawei Ascend hardware continues to be a differentiator for the geopolitics of AI. Every generation of GLM-5 has been trained without a single NVIDIA chip. That matters for supply-chain independence - and for developers in jurisdictions where US export controls may eventually restrict access to US-trained model weights.

Pricing and Availability

GLM-5.2 is available now to all GLM Coding Plan subscribers at no extra charge:

TierMonthly PricePrompt Quota
Lite~$10/month~400 prompts/week
Pro~$30/month~2,000 prompts/week
Max~$80/month~8,000 prompts/week
TeamSeat-basedNegotiated

Standalone API access (token-based pricing) is scheduled for mid-June 2026, alongside the open-weight release on the zai-org HuggingFace account under MIT license. Until those land, the only way to run GLM-5.2 is through the Z.ai Coding Plan or via the eight supported coding agents.

GLM-5 on OpenRouter currently prices at $0.60/M input and $1.92/M output tokens, which represents the Z.ai API tier from the previous generation. GLM-5.2's standalone pricing hasn't been announced, but the company has positioned it at "roughly 1/10th" of Claude Max and equivalent Anthropic tiers.

For comparison, Claude Max runs at the top subscription tier (around $200/month). The GLM Coding Plan Max tier at $80/month with ~8,000 prompts per week works out to about $0.01 per prompt if you hit the ceiling - competitive for high-volume agent workloads.

Strengths and Weaknesses

Strengths

  • 1M token context window is a real architectural step up, directly useful for long agent sessions
  • MIT license with full open weights planned - the most permissive license in the GLM series history
  • Day-one integration for eight major coding agents removes setup friction
  • Trained entirely on domestic hardware - immune to NVIDIA-based export control scenarios
  • Price per prompt is substantially lower than comparable US frontier subscriptions
  • 300 tokens/second inference speed is fast for a 744B model class

Weaknesses

  • No official benchmark numbers published at launch - BridgeBench is a third-party claim only
  • Standalone API and open weights were not available at launch; both pending mid-June rollout
  • Thinking-effort modes (High vs Max) have no published latency or cost breakdown
  • Prompt-quota pricing on the Coding Plan makes costs unpredictable for high-volume workloads
  • No published improvement in coding-specific benchmarks over GLM-5.1

Sources

✓ Last verified June 15, 2026

James Kowalski
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.