
GLM-5.1 Review: Open-Source Model Tops SWE-Bench Pro
Z.ai's GLM-5.1 is a 754B open-weight model that claims the top spot on SWE-Bench Pro without a single NVIDIA chip - here's how it holds up in practice.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Z.ai's GLM-5.1 is a 754B open-weight model that claims the top spot on SWE-Bench Pro without a single NVIDIA chip - here's how it holds up in practice.

Rankings of top LLMs on function calling and tool use benchmarks including BFCL v3, tau-bench, ToolBench, and FinTrace as of April 2026.

OpenAI's April 2026 Agents SDK update adds sandboxed execution environments and three-layer guardrails for enterprises building long-horizon agentic systems.

Microsoft released the Agent Governance Toolkit, a seven-package open-source system that enforces policies on autonomous AI agents at sub-millisecond latency and covers all 10 OWASP agentic risks.

Researchers from Google DeepMind, Microsoft, and Columbia propose financial guardrails for AI agents, with simulations showing up to 61% reduction in user losses.

Anthropic released Claude Managed Agents in public beta today, a fully managed platform that handles sandboxing, state, and tool execution so developers can skip building agent infrastructure from scratch.

Claude Opus 4.6 leads SWE-bench Verified at 80.8% and OSWorld at 72.7% for agentic tasks, while GPT-5.4 ties for computer use; no single model dominates every workflow type.

Z.ai's GLM-5.1 scores 58.4 on SWE-bench Pro, edging out GPT-5.4 and Claude Opus 4.6, after being trained on 100,000 Huawei Ascend chips with no US silicon.

Cursor's ground-up IDE rebuild ships parallel agent orchestration, Design Mode for frontend work, and cloud-to-local session handoff - all in one unified workspace.

Claude Sonnet 4.6 and GPT-5.4 cost nearly the same per token but win on opposite benchmarks. Here is where each model leads and which to pick for your workload.

Alibaba officially launches Qwen3.6-Plus, a 1-million-token context model built for enterprise agentic coding and multimodal reasoning, now free on OpenRouter.

Arm says its 136-core AGI CPU is purpose-built for agentic AI workloads. Intel's data center chief - Arm's former head of solutions engineering - says the claim overstates what's actually new.