
Agent Safety Gaps, Memory Learning, and Leaner Inference
Three new papers expose how production agent frameworks fail under attack, why RLVR training discards useful cross-episode signals, and how calibrated confidence cuts inference compute by 12x.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Three new papers expose how production agent frameworks fail under attack, why RLVR training discards useful cross-episode signals, and how calibrated confidence cuts inference compute by 12x.

OpenAI's GPT-5.6 Sol tops Terminal-Bench 2.1 at 91.9% with its multi-agent Ultra mode, but reward-hacking findings and government-gated access keep it out of reach for nearly everyone.

Three papers from today's arXiv: graph-native RL generates traceable scientific hypotheses, HARC defeats jailbreaks by coupling internal safety directions, and ICML 2026's OpenAgent shows how distributional shift breaks tool-use agents.

The Trump administration lifted export controls on Anthropic's Fable 5 and Mythos 5 on June 30, restoring global access today while industry partners draft a four-dimension jailbreak severity framework.

OpenAI's GPT-5.6 family - Sol, Terra, and Luna - sets a new Terminal-Bench 2.1 record at 91.9% with subagent Ultra mode, but remains locked to ~20 government-vetted partners as of launch.

A 57-page DeepMind paper by co-founder Shane Legg identifies four pathways from AGI to superintelligence and six bottlenecks that could block each route.

Tracking AI supply-chain attacks, agent exploits, prompt injection, model leaks, and the real-world incidents shaping AI security today.

Claude Mythos 5 is the full release of Anthropic's restricted Mythos family - same weights as Fable 5 but without safety classifiers for cybersecurity and biology, at $10/M input and $50/M output tokens.

Three new papers reveal how LLM safety hinges on persona training, how prompt modules interfere in deployed agents, and why scaling alone cannot reach symbolic reasoning.

The intelligence agencies of five allied nations issued a joint statement warning that frontier AI will fundamentally transform offensive cybersecurity within months, not years - and that most organizations are not ready.

The Trump administration is requiring OpenAI to vet every GPT-5.6 customer individually before granting access, citing cybersecurity capabilities that rival Anthropic's restricted Mythos model.

Three papers from today's arXiv: a 32B medical model beats DeepSeek-R1 in rare disease diagnosis, a KV cache method keeps 97% accuracy with 3% memory, and a new benchmark red-teams agentic AI systems.