
Open Agent Leaderboard: Model Beats Architecture
IBM Research tests 25 agent configurations across 6 real-world benchmarks and finds backbone model choice matters 58x more than agent framework design.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

IBM Research tests 25 agent configurations across 6 real-world benchmarks and finds backbone model choice matters 58x more than agent framework design.

South Korean startup LetinAR raises $18.5M to scale its PinTILT optical modules, which already power AI glasses and AR helmets as shipments hit 8.7 million units globally in 2025.

Google's next video model surfaced in the Gemini UI a week before I/O 2026, showing editing-first features including in-chat watermark removal and object swapping.

OpenAI launched ChatGPT Personal Finance on May 15, giving Pro users read access to 12,000+ banks via Plaid - one day after a class action alleged OpenAI shared user conversations with Meta and Google.

Tracking AI supply-chain attacks, agent exploits, prompt injection, model leaks, and the real-world incidents shaping AI security today.

ArXiv is issuing one-year submission bans to authors whose papers contain verifiable unvetted AI output, as fabricated academic citations hit a tenfold increase since 2023.

OpenAI's Codex coding agent arrives on iPhone and Android as a remote control for desktop sessions, with QR code pairing and live terminal output for its 4 million weekly users.

Raindrop's MIT-licensed Workshop streams every token and tool call from your AI agent to a local browser dashboard, then lets Claude Code write and fix evaluations automatically.

Mira Murati's startup unveils TML-Interaction-Small, a 276B MoE model that hits 0.40-second response latency by listening and generating speech at the same time.

Cerebras raised $5.55B and surged 68% on Nasdaq debut, the largest US tech IPO since Snowflake in 2020, but a 200x revenue multiple and a single-customer backlog tell a more complicated story.

IBM's Granite Embedding Multilingual R2 ships with a 64x context window jump, ModernBERT architecture, and Apache 2.0 licensing that makes it enterprise-safe out of the box.

xAI operates 46 gas turbines at its Southaven data center power plant, five above its state permit, as the NAACP seeks an emergency court order over Clean Air Act violations.