
AI2 Fires Up $152M Blackwell Cluster for Open Science
AI2's federally backed OMAI compute cluster is now running on NVIDIA Blackwell Ultra hardware and has already shipped OLMo, Molmo 2, and MolmoAct models fully open to researchers.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

AI2's federally backed OMAI compute cluster is now running on NVIDIA Blackwell Ultra hardware and has already shipped OLMo, Molmo 2, and MolmoAct models fully open to researchers.

Zyphra's ZAYA1-8B matches Claude 4.5 Sonnet on HMMT 2025 math benchmarks at just 760M active parameters, trained entirely on AMD Instinct MI300X GPUs under Apache 2.0.

Zyphra's ZAYA1-8B is an 8.4B-parameter MoE reasoning model with only 760M active parameters that matches DeepSeek-R1-0528 on math and coding benchmarks while running at a fraction of the compute cost.

Three new papers show that more agent components backfire, reasoning models hide unsafe thinking, and vision-language models waste most of their attention.

OpenAI's second-generation real-time audio model with GPT-5-class reasoning, 128K context, five reasoning levels, and parallel tool calling - now generally available in the Realtime API.

OpenAI's Realtime API exits beta with GPT-Realtime-2, Translate, and Whisper - three specialized voice models splitting reasoning, translation, and transcription into distinct endpoints.

MiniMax M2.7 is the first open-weight frontier model to automate 30-50% of its own training pipeline - but a controversial license change and sluggish speed complicate the story.

Google DeepMind's May 2026 AlphaEvolve impact report shows the system running in production across infrastructure, quantum computing, genomics, and commercial partnerships spanning logistics to fintech.

Three new papers deliver a runtime safety firewall for agent tools, challenge how we measure AI alignment, and introduce elastic context management for long-horizon search agents.

OpenAI's new default ChatGPT model cuts hallucinations by 52.5% and adds Gmail-backed personalization while maintaining the low latency of its predecessor.

Claude Opus 4.7 scores 87.6% on SWE-bench Verified but costs $5/$25 per million tokens. These four models match or near-match its coding performance at a fraction of the price on OpenRouter.

Gemini 3.1 Pro leads verified 2026 benchmarks at $2 per million tokens while GPT-5.5 and Claude Opus 4.7 postdate available translation evaluations - rankings, scores, and pricing for 10 models.