Articles Tagged "AI Safety"

GPT-5.6 Sol Review: Strong Model, Thin Access

GPT-5.6 Sol Review: Strong Model, Thin Access

OpenAI's GPT-5.6 Sol tops Terminal-Bench 2.1 at 91.9% with its multi-agent Ultra mode, but reward-hacking findings and government-gated access keep it out of reach for nearly everyone.

Science Agents, Jailbreak Defense, and Open-World Failures

Science Agents, Jailbreak Defense, and Open-World Failures

Three papers from today's arXiv: graph-native RL generates traceable scientific hypotheses, HARC defeats jailbreaks by coupling internal safety directions, and ICML 2026's OpenAgent shows how distributional shift breaks tool-use agents.

US Ends Fable 5 Ban, Sets Jailbreak Severity Scale

US Ends Fable 5 Ban, Sets Jailbreak Severity Scale

The Trump administration lifted export controls on Anthropic's Fable 5 and Mythos 5 on June 30, restoring global access today while industry partners draft a four-dimension jailbreak severity framework.

GPT-5.6

GPT-5.6

OpenAI's GPT-5.6 family - Sol, Terra, and Luna - sets a new Terminal-Bench 2.1 record at 91.9% with subagent Ultra mode, but remains locked to ~20 government-vetted partners as of launch.

Claude Mythos 5

Claude Mythos 5

Claude Mythos 5 is the full release of Anthropic's restricted Mythos family - same weights as Fable 5 but without safety classifiers for cybersecurity and biology, at $10/M input and $50/M output tokens.

White House Forces GPT-5.6 Into a Staged Rollout

White House Forces GPT-5.6 Into a Staged Rollout

The Trump administration is requiring OpenAI to vet every GPT-5.6 customer individually before granting access, citing cybersecurity capabilities that rival Anthropic's restricted Mythos model.

AI Diagnosis, Cache Efficiency, and Agent Security

AI Diagnosis, Cache Efficiency, and Agent Security

Three papers from today's arXiv: a 32B medical model beats DeepSeek-R1 in rare disease diagnosis, a KV cache method keeps 97% accuracy with 3% memory, and a new benchmark red-teams agentic AI systems.