
Claude Sonnet 5 Review: Near-Opus at Half the Price
Anthropic's Sonnet 5 is the first mid-tier model that genuinely competes with Opus-class agents on coding and computer use, released June 30 at $2/$10 per million tokens.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Anthropic's Sonnet 5 is the first mid-tier model that genuinely competes with Opus-class agents on coding and computer use, released June 30 at $2/$10 per million tokens.

Z.ai's GLM-5.2 delivers frontier coding performance with open weights and MIT license at roughly one-sixth the cost of GPT-5.5 - but can it replace Claude Opus 4.8?

Grok 4.3 slashes prices by up to 83%, adds native video input and voice cloning, and carves out a credible position as the most cost-efficient frontier model - with real caveats on coding and latency.

Google DeepMind's DiffusionGemma generates 1,000+ tokens per second through parallel diffusion, trading 5-19 benchmark points against Gemma 4 for speed and unique bidirectional generation capabilities.

Mistral's 128B open-weight model consolidates reasoning, coding, and vision into one checkpoint, with remote agents that file pull requests autonomously.

Yahoo Scout is the rare AI search engine that puts source links front and center - here's whether that philosophy holds up in practice.

A hands-on review of all seven MAI models - from the April transcription and image launch to Build 2026's MAI-Thinking-1, MAI-Code-1-Flash, and the multimodal upgrades.

Claude Fable 5 delivers the strongest coding and long-context results Anthropic has ever shipped publicly, but its safety classifiers block enough legitimate work to make that power conditional.

OpenAI's life sciences reasoning model gets a June update with global access and new NGS plugins - strong benchmarks, but still locked behind a Trusted Access Program with no public pricing.

MiniMax M3 arrives as the first open-weight model to combine frontier coding, 1M-token context, and native multimodality - at a fraction of proprietary pricing - but every benchmark figure is self-reported and the weights weren't even shipped at launch.

Claude Opus 4.8 sets new highs on SWE-bench Pro and long-context tasks while a 4x improvement in code flaw detection may matter more than any benchmark number.

Google's Antigravity 2.0 rewrites the platform from a browser IDE into a five-surface agent suite. The architecture is ambitious, the launch was a mess.