
Gemini 2.5 Pro
Google DeepMind's flagship thinking model with 1M-token context, 84% GPQA Diamond, and native multimodal understanding of text, images, audio, and video.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Google DeepMind's flagship thinking model with 1M-token context, 84% GPQA Diamond, and native multimodal understanding of text, images, audio, and video.

Subquadratic's SubQ claims the first linear-scaling LLM with a 12M-token window - but private beta access, self-reported benchmarks, and a 17-point MRCR gap make independent verification the only test that matters.

OpenAI's coding-optimized API model with a 1M token context window, 54.6% SWE-bench Verified score, and $2/$8 per million token pricing.

Anthropic's first hybrid reasoning model with togglable extended thinking, a 200K context window, and state-of-the-art SWE-bench performance at $3/$15 per million tokens.

SubQ is the first LLM built on a fully subquadratic attention architecture, achieving a 12M-token research context and 52x faster inference than FlashAttention at 1M tokens.

Subquadratic exits stealth with SubQ, the first frontier model built on a sparse-attention architecture, a $29M seed round, and a 12M-token context window that costs a fraction of Opus.

Qwen3.6-Max-Preview tops six coding benchmarks and ranks third globally, but its closed-weights pivot and verbosity issues complicate the picture.

DeepSeek V4 ships in two open-weight MoE variants - V4-Pro at 1.6T/49B active and V4-Flash at 284B/13B active - both with 1M-token context and MIT license, released April 24, 2026.

MIT researchers show that treating long documents as a Python environment - and letting models recursively spawn sub-models to explore them - beats RAG and extended context windows on every benchmark tested.

LG AI Research's first open-weight vision-language model packs 33B parameters, 262K context, and STEM scores above GPT-5-mini - but ships under a non-commercial license.

Alibaba's Qwen3.5-Omni takes text, images, audio, and video as input and streams both text and speech output in a single end-to-end model with a 256K context window.

Alibaba's first closed-weights flagship Qwen ships with a 256K context window, tops six agentic coding benchmarks, and ranks third on the Artificial Analysis Intelligence Index.