
GAIA Benchmark Leaderboard: Best AI Agents May 2026
Rankings of the best AI models and agent frameworks on the GAIA benchmark, which tests real-world multi-step tasks requiring web browsing, tool use, and multi-hop reasoning.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Rankings of the best AI models and agent frameworks on the GAIA benchmark, which tests real-world multi-step tasks requiring web browsing, tool use, and multi-hop reasoning.

OpenAI hired outside lawyers to explore a breach-of-contract case after its ChatGPT-Siri integration failed to generate anything close to the billions in subscription revenue it expected.

Ramp's May 2026 AI Index shows 34.4% of businesses now pay for Anthropic, edging out OpenAI at 32.3% - the first time Anthropic has led in enterprise adoption.

Sam Altman testified that Elon Musk demanded majority control of OpenAI from the beginning, with an opening ask of 90% equity - a revelation that reframes the entire lawsuit.

OpenAI's Daybreak initiative packages GPT-5.5 and Codex Security into a managed cybersecurity program with 20+ partners - a direct answer to Anthropic's Project Glasswing.

Updated May 2026: DeepSeek V4-Flash reasoning now $0.28/MTok output (8x cheaper than R1), o3-pro launched at $20/$80, Grok 4 retires May 15 - verified pricing across 11 models.

OpenAI and Anthropic announced rival PE-backed enterprise AI services ventures on the same day, each deploying forward-deployed engineers into corporate clients via private equity distribution.

Palisade Research shows frontier AI models autonomously exploit vulnerabilities and deploy working AI inference servers on remote machines, with success rates jumping from 5% to 81% in twelve months.

Nvidia has crossed $40 billion in equity commitments this year, investing in the same AI companies that buy its chips - raising serious questions about circular money flows in the ecosystem.

Six research teams disclosed exploits against Codex, Claude Code, Copilot, and Vertex AI. Every attack went after credentials the agents carried - not the models themselves.

OpenAI's second-generation real-time audio model with GPT-5-class reasoning, 128K context, five reasoning levels, and parallel tool calling - now generally available in the Realtime API.

OpenAI's Realtime API exits beta with GPT-Realtime-2, Translate, and Whisper - three specialized voice models splitting reasoning, translation, and transcription into distinct endpoints.