
GPT-5.4 Lands with Computer Use and 1M Token Context
OpenAI ships GPT-5.4 with built-in computer use that beats human desktop performance, a 1 million token context window, and native Excel and Google Sheets integrations.

OpenAI ships GPT-5.4 with built-in computer use that beats human desktop performance, a 1 million token context window, and native Excel and Google Sheets integrations.

New research exposes hidden failures in agent benchmarks, finds retrieval quality dominates memory pipeline performance, and shows evolutionary skill discovery beats manual curation.

Mercury 2 by Inception Labs is the fastest reasoning LLM available, built on diffusion architecture. We tested the speed, quality, and real-world trade-offs.

Kimi K2.5 leads every coding benchmark, but Qwen3.5-35B-A3B delivers 87-93% of that performance at 3-4x lower cost and runs on a single consumer GPU. Here is the full breakdown.

Inception Labs' Mercury 2 hits 1,196 tokens per second in independent testing - a diffusion architecture that rewires how inference works.

Gemini 3.1 Pro leads ARC-AGI-2, LiveCodeBench, and 11 other benchmarks with 750 million users and 21.5% market share - but developers report stalled responses, leaked thinking tokens, and API outages that make it unusable for production coding and agent workflows.