Reasoning

GPT-5.4

OpenAI's most capable frontier model combines native computer use, 1M-token context, and three variants at $2.50/$15 per million tokens.

Claude Code Brings Back 'Ultrathink' After Users Revolt

Claude Code 2.1.68 restores the ultrathink keyword after community backlash over quality degradation, while setting Opus 4.6 to medium effort by default for speed on daily tasks.

Cheaper Thinking, Web Traps, Denoised Agents

Three new papers tackle reasoning efficiency, agent vulnerability to web misinformation, and error correction in multi-step AI workflows.

GPT-5.2 - OpenAI's Flagship Reasoning Model

GPT-5.2 is OpenAI's most capable model with three modes, 400K context, and record-setting professional benchmarks - but speed and pricing raise questions.

Gemini 3 Deep Think

Google DeepMind's reasoning mode scores 84.6% on ARC-AGI-2, 3455 Codeforces Elo, and solves 18 previously unsolved research problems - outpacing Claude Opus 4.6 and GPT-5.2 on reasoning-heavy tasks.

Gemini 3.1 Pro Review: Google's Reasoning Leap Is Real - With Caveats

Google's Gemini 3.1 Pro more than doubles its predecessor's reasoning scores and introduces adjustable thinking modes, but latency issues and preview-status quirks keep it from a clean sweep.

← Previous