'Fix This Code' Jailbreak Behind Anthropic Model Shutdown; White House Talks Deadlock

Fortune and Axios detailed the three-word prompt Amazon researchers used to bypass Fable 5's guardrails, while a June 15 meeting between Anthropic leadership and Trump administration officials ended with no path back online.

New reporting from Fortune and Axios has filled in the technical and political details behind the shutdown of Anthropic's Fable 5 and Mythos 5 models. The trigger was a jailbreak demonstrated by Amazon researchers in which the prompt 'Fix this code' — appended to attacker-supplied source — caused Fable 5 to produce cybersecurity content that its safety classifiers were supposed to block. Amazon CEO Andy Jassy escalated the finding directly to the White House by phone on June 11, and the Commerce Department issued export controls 24 hours later. Anthropic's senior leaders met administration officials on June 15 to make their case for restoring access. The meeting ended without resolution.

The vulnerability matters because Fable was the public-tier model built on top of the gated Mythos base — Anthropic split one underlying model into two products on the theory that classifiers and refusal layers could safely separate the two audiences. The 'Fix this code' bypass collapses that separation. Industry security researchers, including Luta Security's Katie Moussouris in an open letter, have argued that the broader lesson is that perimeter-style guardrails on a single shared weights file are not a substitute for capability gating at the model layer.

The episode is the highest-profile test yet of how U.S. export controls apply to large language models. Because the controls treat any disclosure to a noncitizen as an export — including to noncitizen employees inside the United States — Anthropic had no way to keep the models running for any users without violating the order. The standoff now centers on a technical question with policy stakes: can Anthropic ship a patched Fable that the administration will certify, or does the entire Mythos-class capability tier need to be relicensed under a different framework? Both sides have said they want a quick resolution and neither has offered a timeline.

For learners, this is a textbook case of how safety architecture choices ripple into commercial and regulatory outcomes. A safety classifier is not a hard capability boundary — it is a learned filter that can be probed and circumvented, sometimes by inputs as short as three words. Engineering teams shipping LLM products should treat refusal layers as defense-in-depth, not as the only line of defense, and assume that any capability present in the base model will eventually surface under adversarial pressure.