GLM-5 Arrives: 744B-Parameter Open-Source Model Built for Agents
Z.ai releases GLM-5, a 744B parameter open-source Mixture-of-Experts model purpose-built for agentic tasks, scoring 77.8% on SWE-bench Verified and 56.2% on Terminal-Bench 2.0.

Z.ai, the AI research lab formerly known as Zhipu AI, has released GLM-5, a 744 billion parameter open-source model that makes a clear bet on the future of AI: agents. While other models aim to be the best at everything, GLM-5 is specifically designed and optimized for complex system development and long-horizon agent tasks. The result is a model that may not top every general benchmark but excels precisely where it matters most for the emerging world of autonomous AI agents.
Architecture: Big Model, Efficient Design
GLM-5 uses a Mixture-of-Experts architecture with 744 billion total parameters, of which approximately 40 billion are active for any given input. This design philosophy, shared with models like DeepSeek V3.2, allows GLM-5 to pack enormous knowledge and capability into a model that is practical to run.
The 40 billion active parameters keep inference costs manageable while the full 744 billion parameter pool gives the model access to a deep well of specialized knowledge. Different experts activate for different types of tasks, meaning the model can bring focused expertise to coding, reasoning, planning, and execution without the cost of running every parameter on every query.
Z.ai trained GLM-5 on 28.5 trillion tokens, a massive dataset that includes code, technical documentation, agent interaction traces, tool-use demonstrations, and general knowledge text. The emphasis on agent-relevant training data is a key differentiator. While most models are trained primarily on web text and code, GLM-5's training mix was deliberately weighted toward the kinds of tasks that agents need to perform.
Agent-First Design
What does it mean for a model to be "built for agents"? In practice, it means GLM-5 excels at several capabilities that are critical for autonomous operation but are often afterthoughts in general-purpose models.
First, there is planning. Agent tasks typically require breaking a complex goal into a sequence of steps, anticipating potential obstacles, and adapting the plan as new information emerges. GLM-5 demonstrates strong performance on planning benchmarks, showing an ability to decompose tasks effectively and maintain coherent strategies over long sequences of actions.
Second, there is tool use. Agents need to interact with external tools like file systems, terminals, APIs, databases, and web browsers. GLM-5 was trained extensively on tool-use scenarios, and it shows a natural fluency in constructing tool calls, interpreting results, and deciding what to do next based on the output.
Third, there is error recovery. Real-world agent tasks rarely go perfectly. Commands fail, APIs return unexpected results, and assumptions turn out to be wrong. GLM-5 has been specifically optimized for its ability to recognize when something has gone wrong, diagnose the problem, and try a different approach. This is harder than it sounds, as many models will stubbornly repeat the same failed approach or give up prematurely when they encounter an error.
Fourth, there is long-horizon coherence. Agent tasks can involve dozens or hundreds of individual actions spread across hours of execution time. Maintaining a coherent understanding of the overall goal, the current state of progress, and the remaining steps requires a kind of working memory that many models struggle with. GLM-5's training on long agent interaction traces helps it maintain focus and coherence over extended task sequences.
Benchmark Results
GLM-5 achieves 77.8% on SWE-bench Verified, a strong result that places it among the top-performing models on this influential coding benchmark. SWE-bench Verified tests whether a model can resolve real issues from open-source GitHub repositories, requiring it to understand the codebase, identify the root cause of the problem, and implement a correct fix.
On Terminal-Bench 2.0, GLM-5 scores 56.2%. While this is below the leading scores posted by Claude Opus 4.6 and GPT-5.2-Codex, it is an impressive result for an open-source model. Terminal-Bench 2.0 is particularly demanding because it requires models to operate autonomously in a terminal environment, using standard development tools to accomplish complex tasks with minimal guidance.
The gap between GLM-5 and the top proprietary models on Terminal-Bench 2.0 is notable but narrowing. Z.ai has indicated that future versions will focus on closing this gap, and the model's strong foundation in agent-relevant skills suggests there is significant room for improvement through continued training and optimization.
The Open-Source Angle
GLM-5 is released as an open-source model, continuing Z.ai's commitment to making frontier AI capabilities freely available. The model weights, training code, and documentation are all publicly accessible, allowing researchers and developers to study, modify, and deploy the model as they see fit.
This is particularly valuable for the agent development community. Building effective AI agents requires extensive experimentation with model behavior, fine-tuning for specific domains, and careful optimization of the interaction between the model and its tools. Having access to the full model, rather than just an API, enables a level of customization and control that is essential for pushing the boundaries of what agents can do.
Z.ai has also released a suite of agent-specific fine-tuning datasets and training recipes. These resources lower the barrier for developers who want to adapt GLM-5 for their specific agent use cases, whether that is automated software development, data analysis pipelines, or customer support automation.
Looking Forward
GLM-5 represents a strategic bet that the future of AI is not just about building better chatbots but about building better agents. As the industry moves toward AI systems that can operate autonomously over extended periods, models that are purpose-built for this paradigm will have a natural advantage.
The agent market is growing rapidly, and the demand for models that can reliably plan, execute, and recover from errors is only going to increase. GLM-5 may not be the most capable model on every benchmark, but its focused design makes it a compelling choice for anyone building agent-based systems. And with continued development, the gap between GLM-5 and the proprietary leaders may continue to shrink.