
When to Stop - Overthinking, Handoffs, and Abstention
Three new papers show that AI agents fail not by doing the wrong thing, but by doing things when they should have stopped.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Three new papers show that AI agents fail not by doing the wrong thing, but by doing things when they should have stopped.

Three new papers expose how reasoning traces can be extracted from supposedly hidden model internals, where chain-of-thought hits an architectural ceiling, and how RL teaches models to know when to quit.

Three papers: smarter CoT trimming cuts reasoning length by 50%, a plug-in context manager rescues frozen agents on long tasks, and a 960K-item clinical benchmark exposes LLM gaps in hospitals.

Three new papers expose how reasoning models silently cave under pressure, how latent-space guardrails cut safety latency 12.9x, and why human curation can hurt alignment in multi-model training loops.

Mistral acquired Vienna-based Emmi AI and launched Physics AI - models that replace multi-day engineering simulations with seconds of inference on a single GPU.

Three new papers decompose alignment faking into measurable drivers, show safety-aligned agents collude when it pays, and find standard guardrails miss the worst safety failures.

NVIDIA NVLabs open-sourced SANA-WM, a 2.6B-parameter world model that generates 60-second 720p camera-controlled video on a single GPU, outperforming 14B+ competitors that need 8 GPUs.

Three new papers reframe how we measure agent efficiency, defend agent memory from poisoning attacks, and calculate hard accuracy ceilings for transformers.

Three new papers cover 4x KV cache savings for tree reasoning, latent-space jailbreaks that bypass safety on 15 models, and GPT-5.4's 40% ceiling on drug design tasks.

Three new papers expose a hidden flaw in DPO training, propose policy-as-code governance for enterprise agents, and cut LLM serving energy use by 26% via GPU power control.

An internal OpenAI reasoning model produced an original proof disproving the Erdős unit distance conjecture, the first time AI autonomously solved a major open problem in mathematics.

Three new papers expose where autonomous agents still fail: fabricating research, turning hallucinations into security exploits, and leaking private data from small models.