US Ends Fable 5 Ban, Sets Jailbreak Severity Scale

The Trump administration lifted export controls on Anthropic's Fable 5 and Mythos 5 on June 30, restoring global access today while industry partners draft a four-dimension jailbreak severity framework.

US Ends Fable 5 Ban, Sets Jailbreak Severity Scale

Nineteen days after the Commerce Department forced Anthropic to shut down public access to its two most capable models, the Trump administration reversed course. On June 30, Commerce Secretary Howard Lutnick signed off on lifting export controls on Claude Fable 5 and Mythos 5. Global access began rolling back on July 1 across Claude.ai, Claude Platform, Claude Code, Claude Cowork, AWS, Google Cloud, and Microsoft Foundry.

The return didn't come free. Anthropic committed to a set of government oversight obligations that, read carefully, represent a lasting shift in how the company operates - not just a temporary fix.

TL;DR

  • Export controls on Fable 5 and Mythos 5 lifted June 30; global access restores July 1
  • Anthropic agreed to pre-release government access, proactive risk detection, and real-time malicious-activity reporting
  • A new classifier blocks the identified jailbreak technique in over 99% of cases - with more false positives in coding tasks as a trade-off
  • Anthropic, Amazon, Microsoft, Google, and Glasswing partners are drafting a shared framework to score jailbreak severity across four dimensions
  • NSA, Treasury, and CISA have until August 1 to deliver a classified benchmark defining which models trigger government review

The 19-Day Blackout

The June 12 export control directive traced back to a report from Amazon researchers who had found a prompt-framing technique that bypassed Fable 5's original safety guardrails. The Commerce Department cited national security authority and added both models to its restricted technologies list, triggering a near-total shutdown.

Why It Hit So Hard

The wording was broader than most observers expected. The controls applied to any "foreign national" - which meant non-citizen Anthropic employees couldn't access the models from within the company's own offices. For anyone outside the US, the models went dark completely. Anthropic earlier secured partial relief for around 100 trusted partners, but the broader market - developers building on the Claude API, millions of Claude.ai subscribers outside the US - stayed locked out all through.

What the Critics Said

Cybersecurity experts were skeptical from the start. Several noted that the jailbreak Amazon discovered granted only minor, previously known capabilities. The technique didn't cross any threshold that would normally trigger export controls. A number of analysts argued the ban looked less like a security fix and more like political pressure - a way to extract concessions from a company whose executives had been publicly critical of the administration's AI policy.

Anthropic itself downplayed the severity of the original jailbreak, describing it as a known class of vulnerability that posed limited real-world risk. That framing didn't save the models from being taken offline.

Code on a screen showing colorful function and query variables The jailbreak involved manipulating how Fable 5 interpreted code-related prompts to steer it past safety filters. Source: unsplash.com

What Anthropic Agreed To

The deal that unlocked the models carries teeth. Commerce Secretary Lutnick's letter, portions of which Anthropic shared publicly, outlines three core commitments:

"Anthropic has agreed to proactively detect and address security risks associated with the models; to work diligently with the U.S. government on protocols and standards and releases for Mythos, Fable and future models; and to inform the US government of any malicious activity."

Lutnick's letter also warns that restrictions can be reimposed if "circumstances change" or if Anthropic fails to meet its commitments.

Deeper Obligations

The public language understates what Anthropic has given up. Beyond the three bullet points Lutnick listed, Anthropic committed to giving government researchers pre-release access to future models with national security effects, and to providing rapid information sharing on jailbreaks and misuse patterns as they surface.

That second point matters more than it sounds. It means the government gets a real-time window into how Fable 5 and its successors are being exploited, before Anthropic would necessarily choose to disclose anything publicly.

The August 1 Deadline

A detail that's received less attention: NSA, Treasury, and CISA have until August 1, 2026 to deliver a classified benchmark. That benchmark is meant to define which models trigger government review and what standards they must meet before deployment. The line between a normal frontier model release and one requiring government sign-off is still being drawn.

The New Classifier and Its Trade-offs

Anthropic didn't just negotiate its way out - it shipped a technical fix. The company trained a new safety classifier targeting the specific prompt-framing method Amazon's researchers used to bypass Fable 5's original guardrails.

The 99% Block Rate

Researchers at the Department of Commerce's Center for AI Standards and Innovation (CAISI) independently tested the classifier and confirmed it blocks the attack method in over 99% of cases. That's a high success rate, but Anthropic stated the trade-off plainly: the classifier also flags more benign requests, especially during routine coding and debugging tasks. When a request is blocked, the user is redirected to Claude Opus 4.8 instead.

Anthropic described this as a "defense in depth" approach - multiple safety layers working together, with the classifier intentionally set to a conservative threshold. The open question is whether the technique Amazon found is uniquely dangerous, or just one of many similar approaches that haven't yet appeared in a published report.

A brass padlock resting on a laptop keyboard with red and green lighting Anthropic's new classifier adds a layer between user prompts and model responses, blocking the known attack pattern in over 99% of tested cases. Source: unsplash.com

An Industry Blueprint for Jailbreak Severity

The most consequential announcement on July 1 isn't the model restoration. It's the jailbreak severity framework Anthropic is proposing with Amazon, Microsoft, Google, and other Glasswing consortium partners.

Four Dimensions

The framework scores jailbreaks across four criteria:

DimensionWhat It Measures
Capability gainHow far the jailbreak reaches beyond tools already available to an attacker
BreadthHow many distinct offensive tasks the same technique enables
Ease of weaponizationHow much human effort is required to turn the technique into a working attack
DiscoverabilityHow widely known the technique already is, or how easily someone could find it

The four-dimension structure aims to distinguish between a jailbreak that lets someone write slightly edgier fiction - low severity across all four - and one that hands a threat actor novel, easy-to-use, hard-to-discover capability against critical infrastructure, which would score high on every dimension.

24/7 Monitoring and a Bug Bounty

For the most severe category - jailbreaks being actively used against critical infrastructure - Anthropic committed to deploying mitigations the moment severity is confirmed. A dedicated team now monitors jailbreak submission channels around the clock.

Anthropic also launched a new HackerOne bug bounty program. Security researchers can submit potential Fable 5 jailbreaks for formal review and earn rewards for valid, novel submissions. This is the first dedicated bounty track Anthropic has created specifically for cybersecurity jailbreaks, as distinct from its existing programs for general model safety issues.

The Real Pressure Behind the Reversal

White House Chief of Staff Susie Wiles framed the decision in clean terms: the administration wanted to "get the best tech deployed as quickly and safely as possible." The fuller picture involves competitive pressure the government couldn't ignore.

Asian Competitors Closing the Gap

The export ban's biggest unintended effect was accelerating Asian competitors. Chinese AI labs released Fugu and Tulongfeng - two models researchers describe as approaching Mythos-level capabilities - during the period Anthropic's models were offline. With American frontier AI absent from global markets for three weeks, foreign alternatives moved into the gap.

Keeping Fable 5 and Mythos 5 locked down meant sacrificing competitive ground to enforce a security requirement that prominent experts questioned. That math stopped working for the administration.


The jailbreak severity framework is what this deal will be remembered for, if it holds together. Right now, government decisions about which models need review happen case by case, with no published criteria. A shared four-dimension scoring system agreed on by Anthropic, Amazon, Microsoft, and Google would give regulators something concrete to point at. What Anthropic gave up to get Fable 5 back - pre-release government access, real-time reporting on exploits, a letter warning of reimposition - is real and not temporary. Whether the safety infrastructure it's building in return justifies those concessions is the question the industry will be arguing about long after the ban is forgotten.

Sources:

Elena Marchetti
About the author Senior AI Editor & Investigative Journalist

Elena is a technology journalist with over eight years of experience covering artificial intelligence, machine learning, and the startup ecosystem.