Articles Tagged "Agentic AI"

GLM-5.2 Ships MIT-Licensed, 1M Context, Zero Benchmarks

Zhipu AI's GLM-5.2 ships with 1M token context, 744B MoE parameters, and MIT license the day after Fable 5 goes offline - but no benchmark numbers at launch.

GLM-5.2

Z.ai's GLM-5.2 is a 744B open-weight MoE model with a 1M token context window, MIT license, and first-day support for eight coding agents at roughly 1/10th the cost of US frontier models.

AgentPerf - First Infrastructure Benchmark for Agents

Artificial Analysis released AgentPerf, the first agentic AI infrastructure benchmark, measuring concurrent agents per megawatt. NVIDIA Blackwell leads with 20x gains over Hopper.

Kimi K2.7-Code

Moonshot AI's Kimi K2.7-Code is a 1T-parameter open-weight MoE coding model with mandatory thinking mode, 256K context, and 30% fewer reasoning tokens than K2.6.

Kimi K2.7-Code - Moonshot's Open-Weight Coding Leap

Moonshot AI ships Kimi K2.7-Code with 30% fewer reasoning tokens and a 21.8% gain on its own coding benchmarks, but the model still trails Claude Opus 4.8 on most tests in the same table.

Niteshift Raises $7M to Be the Cloud for Coding Agents

Former Datadog engineers launch Niteshift, a $7M-backed cloud platform that runs AI coding agents in full-stack environments with model-agnostic routing.

Context Overload, Memory Leaks, and Agent Safety

Three new arXiv papers expose how context bloat tanks agent performance, agent memory bleeds private data, and misaligned behavior spreads through multi-agent systems.

MAI-Code-1-Flash

Microsoft's first in-house coding model, a 137B sparse MoE built natively for GitHub Copilot, beating Claude Haiku 4.5 on SWE-Bench Pro by 16 points.

Ministral 3 8B

Mistral AI's mid-tier open-weight edge model - 8B parameters, 256K context, Apache 2.0 license, built for agentic pipelines and cost-sensitive production workloads.

Devstral 2

Mistral's open-weight coding agent model - 123B parameters, 256K context window, 72.2% on SWE-bench Verified, priced at $0.40/M input tokens.

Grok Build 0.1

Grok Build 0.1 is xAI's first model built specifically for agentic coding workflows, with a 256K context window, native MCP support, and always-on reasoning at $1/M input tokens.

MiniMax M3 Makes 1M Context Viable With Sparse Attention

MiniMax M3 uses sparse attention to cut long-context inference cost 20x, topping GPT-5.5 on coding benchmarks at a fraction of the price.

← Previous