Anthropic on Wednesday unfroze access to its advanced Fable and Mythos AI models after the Trump administration lifted an export-control ban.
The return of the two models — with Fable 5 available for general use and the more powerful Mythos 5 limited to a coalition of trusted partners, as it was prior to the ban — represents significant progress in ongoing talks between President Donald Trump’s administration and AI firms over the responsible deployment of frontier AI models, an issue the U.S. government and the tech industry have sparred over in recent months.
In a Tuesday statement announcing a resolution to one of the AI race’s most intense clashes, Anthropic insisted that its models had always been safe and that the government had blown the situation out of proportion — suggesting that tense conversations still lie ahead about the balance between offering advanced capabilities to defenders and keeping them out of the hands of U.S. adversaries.
The Commerce Department issued its export-control ban after Amazon warned the government that it was possible to circumvent Fable’s safeguards against abuse. On Tuesday, Anthropic reiterated its argument — also advanced by a coalition of leading cybersecurity experts — that the same issue existed in less-powerful AI models, but it also said that it “moved quickly to address the reported bypass.”
After spending the past two weeks working closely with “the government and other partners, including Amazon,” Anthropic said, it “trained an improved safety classifier that targets and blocks the behavior described in the report.”
Researchers at the National Institute of Standards and Technology’s Center for AI Standards and Innovation “have tested both our prior and new safeguards and agree that they are extraordinarily strong,” Anthropic added.
At the same time, the company warned that the changes could have some negative side effects for cybersecurity researchers seeking assistance with defensive work.
“The new classifier … comes at the cost of flagging benign requests more often during routine coding and debugging tasks,” Anthropic said. “As with all our safeguards, we’ll continue to refine this to better distinguish genuine misuse from legitimate requests and reduce false positives.”
Call for a more formal vetting process
While the dispute over Fable and Mythos might be over, the AI industry remains concerned about the Trump administration’s arbitrary approach to scrutinizing frontier models’ availability.
Trump recently issued an executive order establishing a process for frontier AI firms to offer the government early access to certain especially powerful models. On Wednesday, Anthropic said it would provide that early access for “models that materially advance the capability frontier in areas relevant to national security.” The company also said it would share threat intelligence about how hackers are abusing its tools and participate in the vulnerability clearinghouse established in Trump’s directive.
Anthropic stressed its commitment to working closely with the government to address potential AI security risks. It said it was “substantially scaling up” its partnerships with federal agencies, including through dedicated personnel and compute resources. It also vowed to “work with the government and with industry peers toward a shared, voluntary security and evaluation standard for frontier model providers.”
Anthropic also said it was problematic that there was “no agreed-upon standard” for classifying a jailbreak’s severity — an important precondition for any formal model-review process.
“A common standard for assessing AI jailbreaks would help us and other companies launch new models safely, as well as allow our users to make the most of their advanced capabilities,” Anthropic said.
To that end, the company announced, it is working with Amazon, Google, Microsoft and other members of Project Glasswing — through which Anthropic grants Mythos access to vetted organizations — on a “consensus framework” for jailbreak classifications and responses. The company said it envisioned the framework rating each potential jailbreak on four criteria, including how easy it is for someone to discover the workaround and how much additional model capability the workaround enables.