The White House Wants Anthropic to Block All Jailbreaks. That May Be Impossible.

Commerce Department forced Claude Fable 5 offline June 12 under export rules, then demanded a jailbreak fix security researchers call technically impossible

Nikshep Myle Avatar
Nikshep Myle Avatar

By

Image: DepositPhotos

Key Takeaways

Key Takeaways

  • Commerce Department used export controls to force Anthropic’s Claude Fable 5 offline globally.
  • White House demands zero jailbreaks, but experts warn this would halt all frontier AI deployments.
  • Blocking jailbreaks entirely is impossible; managed access controls and logging offer realistic alternatives.

On June 12, the Commerce Department did something it had never done before: it used Export Administration Regulations to force a commercial AI model offline. Not a chip. Not manufacturing equipment. A piece of software. Anthropic’s Claude Fable 5 — a consumer-facing assistant built on top of the more powerful Mythos 5 system — was pulled from global access after officials concluded its guardrails could be bypassed to expose Mythos’s cybersecurity reasoning capabilities, according to the Cloud Security Alliance. Because Anthropic couldn’t reliably distinguish foreign nationals from U.S. users in real time, it shut down both models for everyone. Every frontier AI lab just got put on notice.

What the Government Actually Wants

The NSA says jailbreaks exist. Anthropic says the fix being demanded would freeze the entire industry.

The White House position, as reported by the Washington Examiner, is blunt: Anthropic must proactively test its own models, patch jailbreaks, and flag findings to the government. Officials say they can’t staff a permanent jailbreak-hunting operation across every commercial AI product. In practice, this amounts to demanding zero exploitable gaps — the price of getting Fable 5 back online.

Here’s what the “jailbreak” actually looked like. According to Simon Willison’s independent analysis, researchers asked the model to fix vulnerable code, then manually converted those fixes into exploit-testing scripts. Fable refused the direct security-review request but complied with the reframed “fix this” prompt — because that’s what good coding assistants do.

Anthropic says it received only verbal evidence of a narrow, non-universal technique. The company stated, per Fortune, that “if this standard were applied across the industry, we believe it would essentially halt all new model deployments for all frontier model providers.”

Why Experts Say the Demand Doesn’t Square With Reality

Asking a model to forget what it knows is like asking Google Maps to forget where the roads are.

Large language models are trained to reason flexibly and follow instructions. Guardrails are linguistic restrictions layered on top of knowledge that still exists inside the system. Sufficiently clever prompts — or future AI systems deployed as attackers — can search prompt-space faster than any human red team. Removing vulnerability-discovery capabilities wholesale would gut the exact defensive security work that makes these tools valuable.

Guardrails aren’t a lock. They’re a screen door.

Dozens of cybersecurity leaders have sounded alarms, as reported by Axios. State-backed offensive teams will access equivalent capabilities regardless — through alternative models, covert channels, or homegrown systems. Pulling Fable 5 creates a lopsided dynamic: defenders get hobbled while well-resourced attackers adapt without missing a beat. This pattern echoes broader tech scandals in which industry accountability lagged far behind government intervention.

What Comes Next

The precedent is set, and the realistic path forward looks nothing like what Washington is currently demanding.

Any model exhibiting strong cyber, bio, or chem capabilities could now be reclassified and pulled overnight — your risk calculus for building on frontier AI just shifted. Moves like this parallel how Europe restricts major cloud providers from handling sensitive government data, signaling a global tightening of the leash on frontier AI. The realistic answer isn’t “block all jailbreaks.” It’s managed risk through:

  • access controls
  • rigorous logging
  • monitored workflows

Governments demanding perfect control of imperfect systems don’t get safety. They get paralysis.

Share this

At Gadget Review, our guides, reviews, and news are driven by thorough human expertise and use our Trust Rating system and the True Score. AI assists in refining our editorial process, ensuring that every article is engaging, clear and succinct. See how we write our content here →