Reasoning

Gemini 3.1 Pro

Google DeepMind's Gemini 3.1 Pro leads on 13 of 16 benchmarks with 77.1% ARC-AGI-2, 94.3% GPQA Diamond, and a 1M-token context window at $2/M input.

Gemini 3 Deep Think Review: Google's Reasoning Powerhouse Tested

Google's Gemini 3 Deep Think trades speed for depth, delivering record-breaking reasoning benchmarks - but at a steep price.

Google's Gemini 3.1 Pro Doubles Reasoning Performance and Retakes the AI Crown

Google releases Gemini 3.1 Pro with 77.1% on ARC-AGI-2, more than doubling the reasoning capability of its predecessor and beating Claude Opus 4.6 and GPT-5.2 on most benchmarks.

Google Launches Gemini 3.1 Pro, Claims Top Spot on 13 of 16 Benchmarks

Google releases Gemini 3.1 Pro with dramatically improved reasoning, topping Claude Opus 4.6 and GPT-5.2 on most industry benchmarks.

Reasoning Benchmarks: GPQA, AIME, and Humanity's Last Exam

Rankings of AI models on the hardest reasoning benchmarks available: GPQA Diamond, AIME competition math, and the notoriously difficult Humanity's Last Exam.

Anthropic Releases Claude Opus 4.6 With Agent Teams and 1M Context Window

Anthropic launches Claude Opus 4.6 featuring agent teams, adaptive thinking, 1M token context window, and state-of-the-art performance on Terminal-Bench 2.0 and Humanity's Last Exam.

← Previous