Reviews Articles

GPT-5.6 Sol Review: Strong Model, Thin Access

OpenAI's GPT-5.6 Sol tops Terminal-Bench 2.1 at 91.9% with its multi-agent Ultra mode, but reward-hacking findings and government-gated access keep it out of reach for nearly everyone.

Claude Sonnet 5 Review: Near-Opus at Half the Price

Anthropic's Sonnet 5 is the first mid-tier model that genuinely competes with Opus-class agents on coding and computer use, released June 30 at $2/$10 per million tokens.

GLM-5.2 Review: Best Open-Weight Coder at 1/6 Cost

Z.ai's GLM-5.2 delivers frontier coding performance with open weights and MIT license at roughly one-sixth the cost of GPT-5.5 - but can it replace Claude Opus 4.8?

Grok 4.3 Review: xAI Bets on Price Over Prestige

Grok 4.3 slashes prices by up to 83%, adds native video input and voice cloning, and carves out a credible position as the most cost-efficient frontier model - with real caveats on coding and latency.

DiffusionGemma 26B Review: 4x Faster, Real Tradeoffs

Google DeepMind's DiffusionGemma generates 1,000+ tokens per second through parallel diffusion, trading 5-19 benchmark points against Gemma 4 for speed and unique bidirectional generation capabilities.

Mistral Medium 3.5 Review: Open Agent, Sharp Teeth

Mistral's 128B open-weight model consolidates reasoning, coding, and vision into one checkpoint, with remote agents that file pull requests autonomously.

Yahoo Scout Review: Old-School Links, New-School AI

Yahoo Scout is the rare AI search engine that puts source links front and center - here's whether that philosophy holds up in practice.

Microsoft MAI Models: Seven-Model Suite Reviewed

A hands-on review of all seven MAI models - from the April transcription and image launch to Build 2026's MAI-Thinking-1, MAI-Code-1-Flash, and the multimodal upgrades.

Claude Fable 5 Review: Mythos Power, Real Guardrails

Claude Fable 5 delivers the strongest coding and long-context results Anthropic has ever shipped publicly, but its safety classifiers block enough legitimate work to make that power conditional.

GPT-Rosalind Review: The Gated Drug Discovery Model

OpenAI's life sciences reasoning model gets a June update with global access and new NGS plugins - strong benchmarks, but still locked behind a Trusted Access Program with no public pricing.

MiniMax M3 Review: The Price Disruptor with Caveats

MiniMax M3 arrives as the first open-weight model to combine frontier coding, 1M-token context, and native multimodality - at a fraction of proprietary pricing - but every benchmark figure is self-reported and the weights weren't even shipped at launch.

Claude Opus 4.8 Review: Reliability Over Raw Scores

Claude Opus 4.8 sets new highs on SWE-bench Pro and long-context tasks while a 4x improvement in code flaw detection may matter more than any benchmark number.