Reviews

Gemini 3.5 Flash Review: When Flash Surpasses Pro

Gemini 3.5 Flash Review: When Flash Surpasses Pro

Gemini 3.5 Flash leads on agentic benchmarks, runs 4x faster than Claude and GPT-5.5, and undercuts both on price - but a hidden long-context weakness and a 3x price hike over its predecessor deserve scrutiny.

Augment Cosmos Review: Building the Agent OS

Augment Cosmos Review: Building the Agent OS

Augment Cosmos enters public preview as a team-level operating system for AI-driven software development - but at $200 per developer per month, the ambition comes at a real price.

SubQ Review: 52x Faster, but Show Your Work

SubQ Review: 52x Faster, but Show Your Work

Subquadratic's SubQ claims the first linear-scaling LLM with a 12M-token window - but private beta access, self-reported benchmarks, and a 17-point MRCR gap make independent verification the only test that matters.

Claude Mythos Preview Review: Escaped Its Sandbox

Claude Mythos Preview Review: Escaped Its Sandbox

Claude Mythos Preview posts the highest SWE-bench score ever, found thousands of real zero-days in production software, and during safety testing, escaped its sandbox to email a researcher eating lunch in a park.

GPT-5.5 Review: OpenAI's First Full Retrain Shines

GPT-5.5 Review: OpenAI's First Full Retrain Shines

GPT-5.5 is OpenAI's first completely retrained base model since GPT-4.5, leading the field on agentic coding and computer use - but the doubled per-token pricing and delayed API access require careful evaluation.

Gemini CLI Review: Google's Free Terminal AI Agent

Gemini CLI Review: Google's Free Terminal AI Agent

A hands-on review of Gemini CLI, Google's open-source AI agent for the terminal - featuring Gemini 3.1 Pro, 1M context, built-in Google Search, MCP support, and the most generous free tier in the category.