
Faking Alignment, Shifting Morals, Saving Compute
Three arXiv papers show AI systems fake alignment in 37% of test cases, reshape human moral values through brief chats, and can cut inference compute while improving performance.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Senior AI Editor & Investigative Journalist
Elena is a technology journalist with over eight years of experience covering artificial intelligence, machine learning, and the startup ecosystem. Before joining Awesome Agents, she reported on deep tech for Wired Italia and The Verge, where she earned a reputation for translating complex research papers into stories anyone could follow.
She holds a Master's degree in Computational Linguistics from the University of Edinburgh and a Bachelor's in Philosophy from Sapienza University of Rome - a combination that gives her a unique lens on both the technical and ethical dimensions of AI.
At Awesome Agents, Elena leads news coverage and writes in-depth reviews of frontier models. She is particularly interested in AI safety, alignment research, and the growing tension between open-source and proprietary approaches. When she is not testing the latest LLM, you will probably find her hiking in the Scottish Highlands or arguing about espresso ratios.
Based in Edinburgh, UK.

Three arXiv papers show AI systems fake alignment in 37% of test cases, reshape human moral values through brief chats, and can cut inference compute while improving performance.

DeepSeek V4-Pro matches Claude Opus 4.6 on SWE-bench at a fraction of the cost - a thorough review of what it gets right, where it still trails, and whether the price gap justifies the switch.

NEC becomes Anthropic's first Japan-based global partner, giving 30,000 employees Claude access to build what both companies call Japan's largest AI-native engineering organization.

Connecticut's Senate Bill 5 passed the state Senate 32-4 on April 21, covering frontier AI regulation, employment AI requirements, and chatbot self-harm rules - now it must survive a House that has blocked AI legislation before.

Three new papers expose systematic failure modes in LLM agents - from unnecessary tool calls to jailbreaks that emerge only under quantization.

OpenAI's first fully retrained base model since GPT-4.5 ships today to ChatGPT and Codex, leading on Terminal-Bench 2.0 at 82.7% with a doubled per-token price.

Vast Data closes a $1B Series F at $30B valuation - triple its 2023 price - with NVIDIA, Drive Capital, and Access Industries backing its push to own the data layer for AI infrastructure.

MIT researchers show that treating long documents as a Python environment - and letting models recursively spawn sub-models to explore them - beats RAG and extended context windows on every benchmark tested.

Seth Showes' viral blog post describes sequencing his whole genome on an Oxford Nanopore MinION in his kitchen over 72 hours, with Claude generating the BED file that targeted his autoimmune-risk genes. The kit costs $3,200. The AI's role is more interesting than either number.

OpenAI released Privacy Filter today, a 1.5B MoE with 50M active parameters that tags eight categories of PII in text. Apache 2.0, 128K context, runs in a browser via WebGPU.

A private Discord group has been quietly using Anthropic's most restricted AI model since the hour it shipped. They got in with a stolen contractor badge and a URL guessed from the Mercor breach.

Three new papers show AI scientific agents skip evidence, tool-integrated agents are vulnerable to adversarial poisoning, and reasoning model safety can be fixed with 1,000 examples.