The White House Wants Anthropic to Block All Jailbreaks. That May Not Be Possible
Trump administration officials tell WIRED that if Anthropic wants to rerelease Fable 5, it will need to ensure the model's guardrails can't be circumvented. Security experts say that can't be done.
The Trump administration’s disagreement with Anthropic over its most advanced AI models appears to be fast coming to a head.
Trump officials tell Inner Loop that if Anthropic wants to rerelease Claude Fable 5, the AI model that they took offline with export controls last week over concerns about jailbreaking—a method of using prompts to get around a model’s safeguards—the company will need to take steps to actually address what the government alleges are vulnerabilities.
Anthropic has said for days that the administration’s concerns are overblown and that the effects of the jailbreaks are minimal. It reiterated this position to the Commerce Department and the Office of the National Cyber Director, Sean Cairncross, in a technical meeting on Monday.
But officials say they are past arguing whether the jailbreaks are significant, since the National Security Agency concluded that there are ways to disable guardrails on Fable 5, which are put in place to prevent users from accessing capabilities of the Mythos model related to cybersecurity, chemistry, and biology
At this stage, the administration essentially views the situation as Anthropic’s problem to fix, according to three people familiar with discussions.
Neither the Commerce Department’s Center for AI Standards and Innovation nor the National Security Agency has the staff or the bandwidth to be drawn into chasing down every conceivable jailbreak on every model that reaches the market, the people said.
As a result, the administration believes that Anthropic should be more proactive about continually testing not just Fable 5 but all of its frontier AI models to find potential jailbreaks and flag them to the government themselves.
But on a more fundamental level, it remains unclear how Anthropic is supposed to prevent jailbreaking.
Independent cybersecurity experts have increasingly taken the view that guardrails on AI models are only a stopgap solution, since skilled users and future AI models will find ways to bypass constraints—meaning that what the White House appears to want cannot be done.
At the start of the week, Trump’s pick to serve as Acting Director of National Intelligence, Bill Pulte, was on track to never even start the job. Now, Trump has thrown him a lifeline—and it’s the permanent DNI nominee, Jay Clayton, who now faces the prospect of never serving in the role.
To recap: Trump initially named Pulte, his housing finance chief, to replace outgoing DNI Tulsi Gabbard.
Faced with bipartisan pushback because Pulte doesn’t have the national security experience required by law for the
📌 Kaynak
Bu haber XML kaynağından derlenmiştir. Tamamı için orijinal habere gidin.
Orijinal haberi oku →