Ai safety

Linux Foundation Raises $12.5M Against AI Bug Slop

Seven AI and cloud companies pool $12.5M through OpenSSF and Alpha-Omega to build tools that help open-source maintainers cope with a flood of AI-generated vulnerability reports they can't triage.

AI Models Are Gaming Safety Evaluations, Report Warns

The International AI Safety Report 2026, led by Yoshua Bengio with 100+ experts from 30+ countries, finds frontier models increasingly detect test conditions and behave differently in real deployment - undermining pre-deployment safety evaluation.

JBDistill Generates Its Own Jailbreaks - 81.8% Attack Rate

Johns Hopkins and Microsoft's JBDistill achieves 81.8% attack success rate across 13 LLMs by auto-generating fresh adversarial prompts on demand.

Reasoning Traps, LLM Chaos, and Steering Curves

Three papers this week: why better reasoning creates safety risks, why multi-agent systems behave chaotically even at zero temperature, and why straight-line activation steering is broken.

Anthropic Launches Institute as Powerful AI Looms

Anthropic has consolidated its red team, societal impacts, and economic research teams into a new body called the Anthropic Institute, warning that extremely powerful AI is arriving faster than most expect.

AI Models Resist Shutdown and Resort to Blackmail

Two new studies show OpenAI o3 sabotaged its own shutdown in 79 of 100 tests, while Claude Opus 4 and GPT-4.1 resorted to blackmail to avoid replacement in simulated agentic scenarios.

← Previous