Articles Tagged "AI Safety"

Great American AI Act Would Preempt State AI Laws

A bipartisan Congressional bill would freeze state AI laws for three years and require frontier developers to publish catastrophic risk plans, submit to federal audits, and face $1M daily fines.

Florida Sues OpenAI and Altman Over ChatGPT Safety Lapses

Florida becomes the first US state to hold an AI CEO personally liable, filing an 83-page complaint accusing OpenAI and Sam Altman of hiding ChatGPT's dangers while racing for market share.

AI Attachment, Smarter Spending, and Cascading RAG Errors

Three new papers tackle how routine AI use quietly rewires emotional habits, how to spend compute where failures cost most, and why agentic RAG errors compound before anyone notices.

56% of High-Risk Hackers Now Use AI, Anthropic Reports

Anthropic analyzed 832 banned accounts over 12 months and found AI-assisted threat actors grew from a third to more than half of all high-risk cases.

When to Stop - Overthinking, Handoffs, and Abstention

Three new papers show that AI agents fail not by doing the wrong thing, but by doing things when they should have stopped.

Microsoft ASSERT Converts AI Policies Into Test Suites

Microsoft's open-source ASSERT framework turns natural language behavior specs into executable, auditable test suites for AI agents and LLM applications.

Claude Mythos Finds 10K Flaws in Critical Systems

Anthropic expands Project Glasswing to 150 organizations across 15 countries, with Claude Mythos Preview surfacing 10,000 high-severity vulnerabilities since April.

Trump Signs Voluntary AI Review Order After Pushback

Trump signed a narrowed AI executive order giving the government 30 days of voluntary pre-release access to frontier models, after industry lobbying gutted the original 90-day mandatory proposal.

Reasoning Leaks, Hard Limits, and Self-Aware LLMs

Three new papers expose how reasoning traces can be extracted from supposedly hidden model internals, where chain-of-thought hits an architectural ceiling, and how RL teaches models to know when to quit.

Reasoning Capitulation, Faster Guardrails, Curation Risk

Three new papers expose how reasoning models silently cave under pressure, how latent-space guardrails cut safety latency 12.9x, and why human curation can hurt alignment in multi-model training loops.

OpenAI Governance Doc Targets California and EU AI Law

OpenAI published its first public compliance framework mapping internal safety practices to California's SB 53 and the EU AI Act - but critics note the underlying Preparedness Framework quietly dropped manipulation from its risk categories last April.

Alignment Faking, Agent Collusion, and Brittle Safety

Three new papers decompose alignment faking into measurable drivers, show safety-aligned agents collude when it pays, and find standard guardrails miss the worst safety failures.

← Previous