
GLM-4.7-Flash
Zhipu's GLM-4.7-Flash is a 30B-A3B MoE model that posts 59.2% on SWE-bench Verified and 79.5% on tau2-Bench while running on a single RTX 4090 - MIT licensed and free via the Z.AI API.

Zhipu's GLM-4.7-Flash is a 30B-A3B MoE model that posts 59.2% on SWE-bench Verified and 79.5% on tau2-Bench while running on a single RTX 4090 - MIT licensed and free via the Z.AI API.

NVIDIA's hybrid Mamba2+MoE model packs 31.6B total parameters but activates only 3.2B per token, delivering frontier-class reasoning with 3.3x the throughput of comparable models on a single H200 GPU.

Anthropic's flagship model leads on agentic coding, enterprise knowledge work, and long-context retrieval with a 1M-token window, 128K output, and agent teams at $5/$25 per million tokens.