Reviews Articles

Gemini 3.5 Flash Review: When Flash Surpasses Pro

Gemini 3.5 Flash Review: When Flash Surpasses Pro

Gemini 3.5 Flash leads on agentic benchmarks, runs 4x faster than Claude and GPT-5.5, and undercuts both on price - but a hidden long-context weakness and a 3x price hike over its predecessor deserve scrutiny.

Augment Cosmos Review: Building the Agent OS

Augment Cosmos Review: Building the Agent OS

Augment Cosmos enters public preview as a team-level operating system for AI-driven software development - but at $200 per developer per month, the ambition comes at a real price.

SubQ Review: 52x Faster, but Show Your Work

SubQ Review: 52x Faster, but Show Your Work

Subquadratic's SubQ claims the first linear-scaling LLM with a 12M-token window - but private beta access, self-reported benchmarks, and a 17-point MRCR gap make independent verification the only test that matters.

Claude Mythos Preview Review: Escaped Its Sandbox

Claude Mythos Preview Review: Escaped Its Sandbox

Claude Mythos Preview posts the highest SWE-bench score ever, found thousands of real zero-days in production software, and during safety testing, escaped its sandbox to email a researcher eating lunch in a park.