
Agent Safety Gaps, Memory Learning, and Leaner Inference
Three new papers expose how production agent frameworks fail under attack, why RLVR training discards useful cross-episode signals, and how calibrated confidence cuts inference compute by 12x.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Three new papers expose how production agent frameworks fail under attack, why RLVR training discards useful cross-episode signals, and how calibrated confidence cuts inference compute by 12x.

Three papers from today's arXiv: graph-native RL generates traceable scientific hypotheses, HARC defeats jailbreaks by coupling internal safety directions, and ICML 2026's OpenAgent shows how distributional shift breaks tool-use agents.

Three new arXiv papers map capability cliffs in agent world models, the narrow benefit of learned reasoning stops, and a 56% accuracy ceiling when agents help users build preferences.

Three new papers on agents inventing symbolic languages to cut reasoning tokens by 3-6x, sampling ceilings that waste inference compute, and context-engineering to double agentic abstention rates.

Three new arXiv papers on making RL reasoning legible across models, fixing broken world model latent states, and training small agents to beat their teachers.

A 57-page DeepMind paper by co-founder Shane Legg identifies four pathways from AGI to superintelligence and six bottlenecks that could block each route.

Three new papers reveal how LLM safety hinges on persona training, how prompt modules interfere in deployed agents, and why scaling alone cannot reach symbolic reasoning.

Three new arXiv papers reveal hidden costs in quantized reasoning models, single-token failure triggers, and a new framework that cuts agent memory errors by up to 79%.

Three papers from today's arXiv: a 32B medical model beats DeepSeek-R1 in rare disease diagnosis, a KV cache method keeps 97% accuracy with 3% memory, and a new benchmark red-teams agentic AI systems.

Sakana Fugu tops SWE-Bench Pro by routing tasks across rival LLMs, Microsoft's 9B browser agent beats OpenAI Operator, and a 3B model from Weibo matches DeepSeek V3.2 on math.

Three arXiv papers: a conscience mechanism for ethical training, shared memory for agent populations, and selective verification that cuts test-time compute waste.

OpenAI's Deployment Simulation replays 1.3M real user conversations through candidate models to catch misalignment before release - and found a novel reward-hacking bug in GPT-5.1.