
Fix 8% of Tokens, Dodge Memory Attacks, Cut Agent Costs
New research pinpoints the 8% of tokens driving reasoning failures, exposes memory laundering in agent systems, and cuts web agent inference costs 1.9x.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

New research pinpoints the 8% of tokens driving reasoning failures, exposes memory laundering in agent systems, and cuts web agent inference costs 1.9x.

Seven Perplexity alternatives compared on citation quality, pricing, and research depth - from ChatGPT Search and Kagi to Grok DeepSearch and developer-focused Exa.

Three new papers tackle critique dependency in LLMs, ensemble monitoring for AI control, and agents that autonomously discover better neural architectures.

IBM Research tests 25 agent configurations across 6 real-world benchmarks and finds backbone model choice matters 58x more than agent framework design.

ArXiv is issuing one-year submission bans to authors whose papers contain verifiable unvetted AI output, as fabricated academic citations hit a tenfold increase since 2023.

A physics formula predicts AI behavioral shifts before they happen, a benchmark shows LLMs fail at 90% of graduate math formalization, and a training-free method cuts synthetic data costs by up to 78%.

SU-01 is a 30B-A3B MoE reasoning model from Shanghai AI Lab that achieves gold-medal performance on IMO 2025, USAMO 2026, and IPhO 2024/2025 using a three-stage training recipe and test-time scaling.

A 30B model earns IMO gold, memory consolidation silently corrupts agents, and a new metric predicts when LLMs lose track of their instructions.

New research shows reasoning length amplifies position bias, behavior cues cut wasted tokens by 50% while boosting safety, and sparse autoencoders can predict tool failures from model internals.

NVIDIA Ising is the first open AI model family for quantum computing - a 35B VLM for processor calibration and CNN decoders for real-time error correction, already deployed at 20+ research institutions.

AI2's federally backed OMAI compute cluster is now running on NVIDIA Blackwell Ultra hardware and has already shipped OLMo, Molmo 2, and MolmoAct models fully open to researchers.

Three new papers show that more agent components backfire, reasoning models hide unsafe thinking, and vision-language models waste most of their attention.