Articles Tagged "LLM"

SubQ Review: 52x Faster, but Show Your Work

SubQ Review: 52x Faster, but Show Your Work

Subquadratic's SubQ claims the first linear-scaling LLM with a 12M-token window - but private beta access, self-reported benchmarks, and a 17-point MRCR gap make independent verification the only test that matters.

Claude Haiku 4.5

Claude Haiku 4.5

Anthropic's fastest and most cost-efficient model, delivering 73.3% on SWE-bench Verified and first-in-family extended thinking and computer use at $1/$5 per million tokens.

Claude 3.7 Sonnet

Claude 3.7 Sonnet

Anthropic's first hybrid reasoning model with togglable extended thinking, a 200K context window, and state-of-the-art SWE-bench performance at $3/$15 per million tokens.

SubQ

SubQ

SubQ is the first LLM built on a fully subquadratic attention architecture, achieving a 12M-token research context and 52x faster inference than FlashAttention at 1M tokens.

GPT-OSS 20B

GPT-OSS 20B

OpenAI's open-weight 21B MoE reasoning model with 131K context, Apache 2.0 license, and o3-mini-level benchmark performance running in 16 GB of memory.

OpenAI o3-pro

OpenAI o3-pro

OpenAI's maximum-compute reasoning model targets the hardest problems where o3 falls short, at $20/$80 per million tokens.