Aime

Microsoft Phi-4 Reasoning: Small Model, Big Math

Microsoft's Phi-4 reasoning family delivers near-70B-class math performance in a 14B open-weight package, but the overthinking problem is real and the use case is narrower than the benchmarks suggest.

Best AI Models for Math Reasoning - March 2026

GPT-5.2 and Claude Opus 4.6 both score 100% on AIME 2025, while Gemini 3.1 Pro leads GPQA Diamond at 94.3% for PhD-level scientific reasoning.

Math Olympiad AI Leaderboard - March 2026 Rankings

Rankings of AI models on competition mathematics benchmarks including AIME 2025, IMO, MathArena, and FrontierMath, measuring the cutting edge of mathematical reasoning.

Reasoning Benchmarks: GPQA, AIME, and Humanity's Last Exam

Rankings of AI models on the hardest reasoning benchmarks available: GPQA Diamond, AIME competition math, and the notoriously difficult Humanity's Last Exam.

Aime

Microsoft Phi-4 Reasoning: Small Model, Big Math

Best AI Models for Math Reasoning - March 2026

Math Olympiad AI Leaderboard - March 2026 Rankings

Reasoning Benchmarks: GPQA, AIME, and Humanity's Last Exam

Google Analytics