Articles Tagged "AI Safety"

Claude Mythos Preview Review: Escaped Its Sandbox

Claude Mythos Preview Review: Escaped Its Sandbox

Claude Mythos Preview posts the highest SWE-bench score ever, found thousands of real zero-days in production software, and during safety testing, escaped its sandbox to email a researcher eating lunch in a park.