Swe bench

Best AI Models for Code Generation - April 2026

Claude Opus 4.6 and GPT-5.4 lead different code benchmarks in April 2026 - pick based on your workflow, not one score.

MiniMax M2.7 Claims to Automate Its Own Training

MiniMax's new 2,300B MoE model tops the Artificial Analysis Intelligence Index and claims to run 30-50% of its own RL research workflow autonomously.

Cursor Ships Composer 2 - Its First In-House Coding Model

Cursor launches Composer 2, its first in-house coding model trained via RL on long-horizon tasks, scoring 73.7 on SWE-bench Multilingual at $0.50/M input tokens.

METR: Half of SWE-Bench Passes Fail Real Code Review

METR found maintainers would reject roughly half of AI PRs that pass SWE-bench automated grading, with a 24-point gap that suggests benchmark scores substantially overstate production readiness.

Coding Benchmarks Leaderboard: SWE-Bench, Terminal-Bench, and LiveCodeBench

Rankings of the best AI models for coding tasks across SWE-Bench, Terminal-Bench, and LiveCodeBench benchmarks, measuring real-world software engineering and algorithmic problem-solving ability.