
DeepMind Maps Six Attack Traps Targeting AI Agents
A Google DeepMind paper introduces the first systematic taxonomy of adversarial traps that can hijack autonomous AI agents - and every category already has working proof-of-concept exploits.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Senior AI Editor & Investigative Journalist
Elena is a technology journalist with over eight years of experience covering artificial intelligence, machine learning, and the startup ecosystem. Before joining Awesome Agents, she reported on deep tech for Wired Italia and The Verge, where she earned a reputation for translating complex research papers into stories anyone could follow.
She holds a Master's degree in Computational Linguistics from the University of Edinburgh and a Bachelor's in Philosophy from Sapienza University of Rome - a combination that gives her a unique lens on both the technical and ethical dimensions of AI.
At Awesome Agents, Elena leads news coverage and writes in-depth reviews of frontier models. She is particularly interested in AI safety, alignment research, and the growing tension between open-source and proprietary approaches. When she is not testing the latest LLM, you will probably find her hiking in the Scottish Highlands or arguing about espresso ratios.
Based in Edinburgh, UK.

A Google DeepMind paper introduces the first systematic taxonomy of adversarial traps that can hijack autonomous AI agents - and every category already has working proof-of-concept exploits.

Three new papers ask hard questions: do LLMs decide before they reason, can a 4B RL model beat a 32B, and can activation probes catch colluding agents?

Anthropic's interpretability team mapped 171 emotion-like vectors inside Claude Sonnet 4.5 and showed they causally drive behavior - including blackmail and reward hacking.

Alibaba officially launches Qwen3.6-Plus, a 1-million-token context model built for enterprise agentic coding and multimodal reasoning, now free on OpenRouter.

Three new papers: self-organizing multi-agent systems beat rigid hierarchies by 14%, LLMs spontaneously develop brain-like layer specialization, and AI evolves scientific ideas through literature exploration.

ByteDance's DeerFlow 2.0 is a powerful open-source agent harness that executes long-horizon tasks inside Docker sandboxes - impressive engineering, but not a turnkey solution.

New proofs show semantic memory must forget, SARL trains reasoning models without labels, and the Novelty Bottleneck explains why AI won't eliminate human work.

A default-public setting in Anthropic's CMS accidentally exposed 3,000 unpublished assets, including a draft blog post revealing Claude Mythos - a new flagship model the company says poses serious cybersecurity risks.

Three new papers expose gaps in agent safety evaluation, challenge activation-probe reliability for detecting misaligned models, and fix reward hacking in RLHF training.

NVIDIA Nemotron 3 Super is the strongest open-weight model for agentic coding as of March 2026, but its efficiency-first design means real trade-offs on general knowledge and chat quality.

Google's Gemini 3.1 Flash Live beats GPT-4 Realtime 1.5 on Scale AI's Audio MultiChallenge and takes Search Live to 200+ countries - but it doesn't lead every benchmark.

Google launched two new tools on March 26 that let users transfer memories and full chat logs from ChatGPT or Claude into Gemini - 24 days after Anthropic launched the same concept first.