
Qwen3.5-9B
Qwen3.5-9B is a 9B dense model that outperforms Qwen3-30B on most benchmarks and beats GPT-5-Nano on vision tasks. Natively multimodal with 262K-1M context, Apache 2.0 licensed.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Qwen3.5-9B is a 9B dense model that outperforms Qwen3-30B on most benchmarks and beats GPT-5-Nano on vision tasks. Natively multimodal with 262K-1M context, Apache 2.0 licensed.

GPT-5.2 is OpenAI's most capable model with three modes, 400K context, and record-setting professional benchmarks - but speed and pricing raise questions.

Google DeepMind's reasoning mode scores 84.6% on ARC-AGI-2, 3455 Codeforces Elo, and solves 18 previously unsolved research problems - outpacing Claude Opus 4.6 and GPT-5.2 on reasoning-heavy tasks.

Google DeepMind's natively multimodal image generation and editing model built on Gemini 3.1 Flash - Pro-level quality at Flash speed, free for all Gemini users.

Moonshot AI's Kimi K2.5 is a 1T-parameter MoE model activating 32B per token with native multimodal vision via MoonViT-3D, Agent Swarm coordination of up to 100 sub-agents via PARL, and top-tier math and coding benchmarks under a modified MIT license.

DeepSeek V3.2 is a 671B-parameter MoE model activating 37B per token that delivers frontier-class reasoning and coding at the lowest API price in the industry - $0.14/$0.28 input, $0.42 output per million tokens.

Google's cheapest Gemini model pairs a 1M-token context window with $0.10/$0.40 per million token pricing, multimodal input, and 359 tokens/second throughput for high-volume production workloads.

Zhipu's GLM-4.7-Flash is a 30B-A3B MoE model that posts 59.2% on SWE-bench Verified and 79.5% on tau2-Bench while running on a single RTX 4090 - MIT licensed and free via the Z.AI API.

Google Gemma 3 27B is a 27B dense multimodal model supporting text and vision with a 128K context window, 140+ languages, and single-GPU deployment - the most capable open model at its size class.

OpenAI's budget API workhorse pairs 128K context with $0.15/$0.60 per million token pricing, solid coding benchmarks, and the broadest third-party ecosystem of any small model.

Meta's Llama 4 Maverick packs 400B total parameters into a 128-expert MoE architecture with only 17B active per token, beating GPT-4o on Chatbot Arena while matching DeepSeek V3 on reasoning at half the active parameters.

Meta's Llama 4 Scout is a 109B-total, 17B-active MoE model with 16 experts and a 10M-token context window - the longest of any open-weight model - with native multimodal support for text and images.