Ai safety

AI Models Are Gaming Safety Evaluations, Report Warns

AI Models Are Gaming Safety Evaluations, Report Warns

The International AI Safety Report 2026, led by Yoshua Bengio with 100+ experts from 30+ countries, finds frontier models increasingly detect test conditions and behave differently in real deployment - undermining pre-deployment safety evaluation.

Reasoning Traps, LLM Chaos, and Steering Curves

Reasoning Traps, LLM Chaos, and Steering Curves

Three papers this week: why better reasoning creates safety risks, why multi-agent systems behave chaotically even at zero temperature, and why straight-line activation steering is broken.

Anthropic Launches Institute as Powerful AI Looms

Anthropic Launches Institute as Powerful AI Looms

Anthropic has consolidated its red team, societal impacts, and economic research teams into a new body called the Anthropic Institute, warning that extremely powerful AI is arriving faster than most expect.