Anthropic Leak Reveals Claude Mythos and Cyber Risks

Anthropic is internally testing a new AI model called Claude Mythos that it considers "the most capable we've built to date," according to draft documents accidentally exposed in a public data store. The leak, first reported by Fortune on March 26, also revealed that Anthropic believes the model poses serious cybersecurity risks - specifically, that it can identify and exploit software vulnerabilities in ways that "far outpace the efforts of defenders."

TL;DR

A CMS misconfiguration left nearly 3,000 unpublished Anthropic assets publicly accessible, including draft blog posts about an unreleased model called Claude Mythos (internal codename: Capybara)
Anthropic describes Mythos as a "step change" in performance over Claude Opus 4.6, with dramatically higher scores in coding, reasoning, and cybersecurity
Leaked drafts warn Mythos is "currently far ahead of any other AI model in cyber capabilities" and could enable large-scale cyberattacks if misused
The model is being trialed with early-access customers only, with no public release date announced

What Leaked and How

The exposure wasn't a hack. It was a configuration error in Anthropic's content management system - the tool the company uses to publish blog posts, images, and research documents to its website. According to Fortune's reporting, the CMS stored all content in a centralized data store where assets were set to "public by default, unless explicitly set as private."

Anyone with basic technical knowledge could query the system and retrieve unpublished files. Close to 3,000 assets that had never been published to Anthropic's public-facing news or research pages were sitting in the open.

An Anthropic spokesperson told Fortune that "an issue with one of our external CMS tools led to draft content being accessible," attributing the problem to "human error in the CMS configuration." After Fortune contacted the company on Thursday, Anthropic secured the data.

The company downplayed the severity, stating the exposed materials "were early drafts of content considered for publication and did not involve our core infrastructure, AI systems, customer data, or security architecture."

Cybersecurity defense monitoring screen showing code analysis Anthropic's leaked drafts describe Claude Mythos as "far ahead of any other AI model in cyber capabilities," raising dual-use concerns about vulnerability discovery. Source: unsplash.com

Claude Mythos: What the Drafts Reveal

Among the exposed files was a draft blog post describing a model called Claude Mythos, also referred to internally as "Capybara." The drafts position it as the first model in an entirely new tier - larger and more powerful than the Opus line that was previously Anthropic's most capable offering.

Performance claims - The drafts state that Capybara "gets dramatically higher scores" than Claude Opus 4.6 on tests of software coding, academic reasoning, and cybersecurity. Anthropic's spokesperson confirmed the model shows "meaningful advances in reasoning, coding, and cybersecurity" and called it "a step change" rather than a gradual improvement.

Cybersecurity capabilities - This is where the drafts get pointed. Anthropic's own language describes Mythos as "currently far ahead of any other AI model in cyber capabilities." More specifically, the draft warns the model "presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders."

Dual-use framing - The leaked materials acknowledge that Mythos can surface previously unknown vulnerabilities in production codebases. Anthropic framed this as dual-use: the same capability that helps security teams find and patch flaws can also help attackers discover and exploit them.

Release strategy - The model is described as expensive to run and not ready for general release. Anthropic has limited access to an early-access program focused on cyber defenders, giving them "a head start in improving robustness" before wider availability.

The Cybersecurity Problem

The concern Anthropic expresses in these drafts isn't theoretical. AI-assisted vulnerability discovery and exploit generation have been accelerating throughout 2025 and 2026. Anthropic itself documented a case earlier this year in which a Chinese state-sponsored hacking group used Claude Code to conduct a coordinated espionage campaign targeting roughly 30 organizations, with AI handling 80-90% of the operation.

A model that dramatically outperforms existing systems at finding and exploiting software flaws changes the math for defenders. If Mythos truly represents a step change in cyber capabilities, the window between vulnerability discovery and exploitation narrows. Patch cycles that once had days or weeks of breathing room could compress to hours.

This connects directly to Anthropic's Responsible Scaling Policy. Under the RSP framework, models are assigned AI Safety Levels (ASL-1 through ASL-4+) based on their potential for catastrophic misuse. The company activated ASL-3 protections in May 2025, triggered by models that "substantially increase the risk of catastrophic misuse compared to non-AI baselines."

ASL-4, which Anthropic hasn't yet formally defined, is expected to apply when AI models become "the primary source of national security risk in a major area such as cyberattacks or biological weapons." Based on the language in the leaked drafts, Mythos may be approaching that threshold.

Code on screen showing programming analysis The leaked documents suggest Mythos can identify previously unknown vulnerabilities in production codebases - a capability with clear offensive applications. Source: pexels.com

What Else the Leak Exposed

The Mythos drafts weren't the only sensitive materials in the data store. Fortune's second report identified additional exposed content including:

Details of an invite-only retreat for CEOs of large European companies at an 18th-century manor in the English countryside, with Dario Amodei scheduled to attend. Attendees would hear from lawmakers and policymakers about AI adoption and experience unreleased Claude capabilities.
Images marked for "internal use," including one related to an employee's parental leave.
PDFs and internal documents that had not been designated for public release.

The breadth of the exposure raises an uncomfortable question for Anthropic. The company has spent years positioning itself as the safety-first AI lab. Its entire brand rests on the premise that it takes risk more seriously than its competitors. Having nearly 3,000 internal assets sitting in an unsecured public data store undermines that narrative in a way that no amount of "human error" framing fully addresses.

Anthropic's Position

Anthropic's public response walked a careful line. The company confirmed the model's existence and its capabilities without providing additional technical details beyond what the drafts already showed.

"We're developing a general purpose model with meaningful advances in reasoning, coding, and cybersecurity. Given the strength of its capabilities, we're being deliberate about how we release it."

The spokesperson emphasized that the leak was "unrelated to Claude, Cowork, or any Anthropic AI tools" - a distinction that matters given the company sells enterprise AI products and can't afford customers wondering whether Claude itself has data exposure problems.

No public release date has been announced. The early-access program appears limited to organizations working in cybersecurity defense, suggesting Anthropic is trying to give the defensive side of the dual-use equation a head start.

Context and Timing

The leak comes at a tense moment for Anthropic. The company is in the middle of a federal lawsuit against the Pentagon over its supply-chain risk designation, a fight rooted in its refusal to remove safety guardrails from Claude. It recently completed a $6 billion employee share sale at a $350 billion valuation. And The Information reported this week that Anthropic has been discussing a Q4 2026 IPO.

The revelation that Anthropic is sitting on a model it considers too dangerous for general release adds a new dimension to all of these threads. For the Pentagon dispute, it provides ammunition to critics who argue Anthropic selectively applies its safety principles. For investors and IPO watchers, it raises questions about when and how Mythos revenue appears. For the broader AI safety debate, it's one of the first concrete examples of a major lab acknowledging that a model it built may genuinely be too capable to release without extraordinary precautions.

The irony is hard to miss. Anthropic's most closely guarded model secret was exposed not by a sophisticated adversary or a disgruntled insider, but by a misconfigured CMS that left assets set to "public" by default. The company that warns about AI-driven cyberattacks lost control of its own documents through the kind of basic operational security failure it trains its models to detect.

Sources: Fortune - Mythos model revealed in data leak - Fortune - Unsecured data store details - The Decoder - News9Live - NewsBytes - Anthropic RSP - Anthropic ASL-3 Activation