
LongCat-2.0
Meituan's 1.6T-parameter open-source MoE coding model, trained end-to-end on 50,000 domestic Chinese ASICs, with native 1M token context and a 59.5 SWE-bench Pro score.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Meituan's 1.6T-parameter open-source MoE coding model, trained end-to-end on 50,000 domestic Chinese ASICs, with native 1M token context and a 59.5 SWE-bench Pro score.

Meituan open-sources LongCat-2.0, a 1.6T MoE model trained on 50,000 Chinese ASICs that secretly topped OpenRouter under the alias Owl Alpha.

Three papers from today's arXiv: graph-native RL generates traceable scientific hypotheses, HARC defeats jailbreaks by coupling internal safety directions, and ICML 2026's OpenAgent shows how distributional shift breaks tool-use agents.

H Company's open-weight sparse MoE vision-language model purpose-built for desktop computer use, scoring 82.6% on OSWorld-Verified with only 3B active parameters.

Claude Fable 5 leads OSWorld-Verified at 85% after its 19-day US suspension ended July 1 - Holo3 open-source at 82.6% and Claude Sonnet 5 at $2/M tokens reshape the value calculus.

Three new arXiv papers map capability cliffs in agent world models, the narrow benefit of learned reasoning stops, and a 56% accuracy ceiling when agents help users build preferences.

Anthropic's Sonnet 5 is the first mid-tier model that genuinely competes with Opus-class agents on coding and computer use, released June 30 at $2/$10 per million tokens.

Updated July 2026 Chatbot Arena Elo rankings from Arena.ai: 7M+ votes across 368 models, Claude Opus 4.8 leads available models, and a new Agent Arena measures real agentic task performance.

Anthropic's latest Sonnet-class model brings near-Opus coding performance to mid-tier pricing, with major agentic search and computer use gains over Sonnet 4.6.

Anthropic's Claude Sonnet 5 becomes the default model across all plans, promising near-Opus agentic performance at a third less than Sonnet 4.6's standard price.

Elon Musk announced Grok 4.5 is in private beta at SpaceX and Tesla, claiming it rivals Claude Opus - but the only benchmarks cited are internal ones run at Musk's own companies.

Z.ai's GLM-5.2 delivers frontier coding performance with open weights and MIT license at roughly one-sixth the cost of GPT-5.5 - but can it replace Claude Opus 4.8?