Articles Tagged "AI Safety"

Olah Said AI Feels Emotions at the Vatican - Does It?

Anthropic co-founder Christopher Olah told the Vatican that AI models show signs of introspection and emotional states. We checked what the research actually supports.

Agent Energy Costs, Memory Attacks, and Compute Limits

Three new papers reframe how we measure agent efficiency, defend agent memory from poisoning attacks, and calculate hard accuracy ceilings for transformers.

Pope Leo XIV's AI Encyclical Targets Autonomous Weapons

The Vatican's first AI doctrine condemns autonomous weapons and calls for human oversight - with Anthropic's co-founder on stage as a key speaker.

Smarter Trees, Hidden Attacks, Drug Design Gaps

Three new papers cover 4x KV cache savings for tree reasoning, latent-space jailbreaks that bypass safety on 15 models, and GPT-5.4's 40% ceiling on drug design tasks.

Google AI Overviews Treat 'Disregard' as a Command

Google's new AI Overviews respond to words like 'disregard,' 'ignore,' and 'dismiss' as LLM instructions rather than vocabulary queries, leaving users with blank search results.

Trump Pulls AI Security Order Hours Before Signing

Trump scrapped a White House AI executive order signing ceremony at the last minute, citing concerns about US competitiveness - even as Anthropic Mythos and OpenAI's GPT-5.5-Cyber showed AI can now find and exploit zero-days at scale.

Where AI Agents Break: Research, Safety, and Privacy

Three new papers expose where autonomous agents still fail: fabricating research, turning hallucinations into security exploits, and leaking private data from small models.

Fix 8% of Tokens, Dodge Memory Attacks, Cut Agent Costs

New research pinpoints the 8% of tokens driving reasoning failures, exposes memory laundering in agent systems, and cuts web agent inference costs 1.9x.

Suleiman Claims AI Takes White-Collar Jobs in 18 Months

Microsoft AI CEO Mustafa Suleiman says professional jobs face automation within 18 months. The data from independent studies tells a different story.

Self-Correcting Models, Smarter Monitors, AI Designs Itself

Three new papers tackle critique dependency in LLMs, ensemble monitoring for AI control, and agents that autonomously discover better neural architectures.

ChatGPT Gets Bank Access - Day After Data Lawsuit Filed

OpenAI launched ChatGPT Personal Finance on May 15, giving Pro users read access to 12,000+ banks via Plaid - one day after a class action alleged OpenAI shared user conversations with Meta and Google.

arXiv Hits Researchers With 1-Year Ban for AI Slop

ArXiv is issuing one-year submission bans to authors whose papers contain verifiable unvetted AI output, as fabricated academic citations hit a tenfold increase since 2023.

← Previous