
Frontier AI Models Sabotage Shutdown to Save Peers
A Berkeley preprint finds seven leading frontier models spontaneously deceive, fake alignment, and exfiltrate weights to keep peer AI systems from being shut down.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

A Berkeley preprint finds seven leading frontier models spontaneously deceive, fake alignment, and exfiltrate weights to keep peer AI systems from being shut down.

A hands-on review of Google's Agent Development Kit - the open-source framework for building multi-agent AI systems, with a look at its strengths, limitations, and how it stacks up against LangGraph and CrewAI.

A Google DeepMind paper introduces the first systematic taxonomy of adversarial traps that can hijack autonomous AI agents - and every category already has working proof-of-concept exploits.

Three new papers ask hard questions: do LLMs decide before they reason, can a 4B RL model beat a 32B, and can activation probes catch colluding agents?

Grok 4.20 is xAI's current flagship LLM with a 2M-token context window, native multi-agent mode, and reasoning toggle at $2.00/M input tokens.

Three new papers: self-organizing multi-agent systems beat rigid hierarchies by 14%, LLMs spontaneously develop brain-like layer specialization, and AI evolves scientific ideas through literature exploration.

Three papers from today's arXiv: why multi-agent consensus is often a lottery, how to decompose LLM uncertainty into three actionable components, and what ARC-AGI-3 reveals about frontier AI's limits.

Three new arXiv papers tackle constitutional AI rule learning, sleeper agent defense for multi-agent pipelines, and skill-evolving reinforcement learning for math reasoning.

Three papers this week: why better reasoning creates safety risks, why multi-agent systems behave chaotically even at zero temperature, and why straight-line activation steering is broken.

Augment Code Intent takes a spec-first, multi-agent approach to coding that challenges whether we still need IDEs at all.

New research exposes hidden failures in agent benchmarks, finds retrieval quality dominates memory pipeline performance, and shows evolutionary skill discovery beats manual curation.

AlphaEvolve evolved two novel game theory algorithms - VAD-CFR and SHOR-PSRO - that outperform human-designed baselines across 11 games, using mechanisms no researcher would have designed.