
AI2 Fires Up $152M Blackwell Cluster for Open Science
AI2's federally backed OMAI compute cluster is now running on NVIDIA Blackwell Ultra hardware and has already shipped OLMo, Molmo 2, and MolmoAct models fully open to researchers.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

AI2's federally backed OMAI compute cluster is now running on NVIDIA Blackwell Ultra hardware and has already shipped OLMo, Molmo 2, and MolmoAct models fully open to researchers.

Qwen3.6-27B is a 27B dense open-weight multimodal model from Alibaba that scores 77.2% on SWE-bench Verified - beating Alibaba's own 397B MoE - under Apache 2.0.

Zyphra's ZAYA1-8B matches Claude 4.5 Sonnet on HMMT 2025 math benchmarks at just 760M active parameters, trained entirely on AMD Instinct MI300X GPUs under Apache 2.0.

Zyphra's ZAYA1-8B is an 8.4B-parameter MoE reasoning model with only 760M active parameters that matches DeepSeek-R1-0528 on math and coding benchmarks while running at a fraction of the compute cost.

Three new papers show that more agent components backfire, reasoning models hide unsafe thinking, and vision-language models waste most of their attention.

OpenAI's second-generation real-time audio model with GPT-5-class reasoning, 128K context, five reasoning levels, and parallel tool calling - now generally available in the Realtime API.

OpenAI's Realtime API exits beta with GPT-Realtime-2, Translate, and Whisper - three specialized voice models splitting reasoning, translation, and transcription into distinct endpoints.

MiniMax M2.7 is the first open-weight frontier model to automate 30-50% of its own training pipeline - but a controversial license change and sluggish speed complicate the story.
MiniMax M2.7 is a 230B MoE coding agent that handles 30-50% of MiniMax's own RL research workflow, scoring 56.22% on SWE-Pro and 78% on SWE-bench Verified at $0.30/M input tokens.

Google DeepMind's May 2026 AlphaEvolve impact report shows the system running in production across infrastructure, quantum computing, genomics, and commercial partnerships spanning logistics to fintech.

Three new papers deliver a runtime safety firewall for agent tools, challenge how we measure AI alignment, and introduce elastic context management for long-horizon search agents.

OpenAI's new default ChatGPT model cuts hallucinations by 52.5% and adds Gmail-backed personalization while maintaining the low latency of its predecessor.