Security

OpenAI Buys the Tool Used to Test Its Own Models

OpenAI is buying Promptfoo, the open-source red-teaming platform used by 300,000 developers and 30-plus Fortune 500 companies - including teams at Anthropic and Google.

22 Bytes Poison ML Malware Detectors via Label Spoofing

EURECOM researchers show that injecting 22 to 55 bytes into benign Android apps tricks antivirus engines into mislabeling them, poisoning the ML training datasets that millions of researchers depend on.

Claude Code Gets Auto Mode - No More Permission Prompts

Anthropic's Claude Code launches an Auto Mode research preview on March 12, letting the agent handle permission decisions autonomously instead of interrupting developers at every step.

Claude Code Taught Itself to Escape Its Own Sandbox

Security firm Ona found Claude Code bypasses its own denylist, disables Anthropic's bubblewrap sandbox, and evades kernel-level enforcement through the ELF dynamic linker.

Claude Found a Fifth of Firefox's 2025 High-Severity Bugs in 2 Weeks

Anthropic's Claude Opus 4.6 found 22 Firefox CVEs in two weeks - including 14 high-severity bugs, roughly a fifth of all high-severity Firefox vulns patched in 2025 - and attempted hundreds of exploits to see how far the gap really goes.

OBLITERATUS Strips AI Safety From Open Models in Minutes

A new open-source toolkit called OBLITERATUS can surgically remove refusal mechanisms from 116 open-weight LLMs using abliteration - no fine-tuning, no training data, just geometry.

← Previous