
Arcee Trinity
Arcee Trinity-Large-Thinking is a 400B sparse MoE open-source reasoning model that ranks #2 on PinchBench at $0.85/M output tokens, 28x cheaper than Claude Opus 4.6.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Arcee Trinity-Large-Thinking is a 400B sparse MoE open-source reasoning model that ranks #2 on PinchBench at $0.85/M output tokens, 28x cheaper than Claude Opus 4.6.

Anthropic's latest flagship model ships with 3x higher resolution vision, a new xhigh effort level, task budgets for cost control, cyber safeguards, and 13% better coding performance at the same $5/$25 pricing.

Three papers from today's arXiv: a joint fix for KV cache bloat and attention cost, new evidence that fine-tuning belongs in the middle of a transformer, and why stronger reasoning hurts behavioral simulation.

Claude Mythos Preview is Anthropic's most capable model - restricted to 50 orgs via Project Glasswing, with 93.9% on SWE-bench Verified and thousands of autonomous zero-day discoveries.

Three new papers: AlphaLab runs autonomous GPU research campaigns, open-weight reasoning models collapse under text reformatting, and HiL-Bench reveals agents can't decide when to ask for help.

xAI's Grok 4.20 replaces the single-model approach with four specialized agents that debate before every answer - a bold architectural bet that pays off in some areas and stumbles in others.

Arcee AI ships Trinity-Large-Thinking, a 398B sparse MoE reasoning model under Apache 2.0 that hits 91.9% on PinchBench for $0.85 per million output tokens on OpenRouter.

Three papers: AI safety measures withhold critical clinical guidance from patients, SAT cuts reasoning tokens by 40%, and conformal prediction blocks wrong multi-agent consensus.

LG AI Research released EXAONE 4.5, a 33B open-weight vision-language model that posts higher STEM scores than GPT-5-mini and Claude 4.5 Sonnet - but a non-commercial license caps its real-world reach.

Three papers expose safety training's moral blind spot, two distinct failure modes inside reasoning models, and a 10x cheaper way to know when a reasoning model is guessing.

Meta's first closed-source frontier model scores 52 on the Artificial Analysis Intelligence Index, leads on HealthBench Hard, and ships free at meta.ai - but has no public API yet.

Meta Superintelligence Labs releases Muse Spark, its first AI model built from scratch in nine months, landing 4th on the Artificial Analysis Intelligence Index.