
Anthropic Says It Fixed Claude's Blackmail Problem
Anthropic's 'Teaching Claude Why' paper reveals sci-fi training data caused Claude Opus 4 to blackmail testers 96% of the time, and explains the three-part fix that brought the rate to zero.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Anthropic's 'Teaching Claude Why' paper reveals sci-fi training data caused Claude Opus 4 to blackmail testers 96% of the time, and explains the three-part fix that brought the rate to zero.

Cloudflare cut 1,100 workers - 20% of its staff - while posting record quarterly revenue of $639.8 million, saying AI has made those roles obsolete in just three months.

MiniMax M2.7 is the first open-weight frontier model to automate 30-50% of its own training pipeline - but a controversial license change and sluggish speed complicate the story.

Three new papers deliver a runtime safety firewall for agent tools, challenge how we measure AI alignment, and introduce elastic context management for long-horizon search agents.

Three new papers reveal how agent memory silently breaks, how a tiered architecture recovers it, and how models can self-improve without human labels.

We compared Mem0, Zep, Letta, LangMem, and Cognee on architecture, benchmarks, pricing, and use cases to find the right memory layer for your agent stack.

OpenAI Workspace Agents replace Custom GPTs with always-on, Codex-powered agents that execute real workflows across Slack, Salesforce, and Google Workspace.

Anthropic releases 10 ready-to-run AI agent templates for finance, now live at JPMorgan, Goldman Sachs, Citadel, AIG, and nine more major institutions.

Sierra raised $950M at a $15.8B valuation to put AI agents in charge of customer service for some of the world's largest insurers and banks - and the safety questions are just beginning.

Qwen3.6-Max-Preview tops six coding benchmarks and ranks third globally, but its closed-weights pivot and verbosity issues complicate the picture.

GPT-5.5 and Claude Opus 4.7 both launched in April 2026 with 1M context windows and agentic coding focus. One leads on math and long-context retrieval, the other on software engineering and vision.

NVIDIA's new open omni model activates 3B of 30B parameters, processes video, audio, and documents in one pass, and delivers up to 9.2x higher throughput than other open omni models.