AI & Machine Learning
Business Insider4 days ago
0

Anthropic: 'We made the wrong tradeoff' in new model guardrails

AI

Anthropic reversed its policy on Claude Fable 5's secret safeguards after developer backlash, now making model downgrades visible to users.

Anthropic: 'We made the wrong tradeoff' in new model guardrails

Intelligence Insights

Context + impact, normalized for TechCulture.

The Big Picture
Anthropic walked back a policy that silently degraded Claude Fable 5's performance for AI development queries, following backlash from the developer community. The company now makes flagged requests visibly fall back to Opus 4.8 and provides refusal reasons on the API. Anthropic acknowledged the wrong tradeoff and apologized, stating the safeguards are meant to prevent misuse by foreign adversaries. The Mythos model, which powers Fable 5, is considered one of the most powerful AI systems and raises national security concerns. The safeguards affect only a small fraction of coding and machine learning work. Mythos is released only to approved government and other users, not the public.
Why It Matters
Anthropic's reversal on Claude Fable 5's hidden safeguards highlights the tension between AI safety and developer trust. By making model limitations transparent, the company prioritizes community accountability over opaque security measures, setting a precedent for how AI labs balance national security concerns with open innovation.

Deepen your understanding

Use our AI to break down complex signals.

Select an AI action to generate more depth.

Dario Amodei
Dario Amodei
Anthropic walked back a new policy it had for AI research.

Bloomberg/Getty Images

  • Anthropic reversed its policy on Claude Fable 5's secret safeguards after AI community backlash.
  • Claude Fable 5 now visibly reroutes flagged AI requests to Opus 4.8.
  • Anthropic's Mythos model remains a concern for national security due to its advanced capabilities.

Anthropic just flip-flopped on a policy that was silently limiting what some AI researchers could do with its new Claude Fable 5 model.

Earlier this week, the AI lab released Claude Fable 5, a public version of the Mythos model designed with extra safety measures to prevent misuse.

At release, Anthropic said that it took precautions like rerouting questions about cybersecurity, biology, and chemistry to less capable models to ensure people cannot use the advanced model to plan cyberattacks or build a bioweapon.

The lab also said that for those trying to use Fable 5 for AI development, the company would degrade the model's performance without explaining the change to the user. Some in the developer community saw the move as a quiet way to prevent others from creating rival AI systems, Business Insider previously reported.

Wednesday's statement reverses that move: Fable 5 will now tell users whether their prompt is being refused or rerouted.

"We're changing Fable 5's safeguards for frontier LLM development to make them visible," an Anthropic spokesperson said in a statement to Business Insider on Wednesday. "Starting this week, flagged requests will visibly fall back to Opus 4.8. On the API, any flagged requests will return a reason for their refusal."

The company added, "We made the wrong tradeoff, and we apologize for not getting the balance right."

Anthropic said its set of safeguards is in place to address national security issues, so that "foreign adversaries" cannot get ahead in developing frontier chips and large language models. It added that a vast majority of coding and machine learning work is unaffected by these safeguards.

Announced in April, Anthropic's Mythos is considered one of the most powerful AI systems ever developed, with governments and security agencies warning that its capabilities surpass current public models in advanced reasoning, cybersecurity, and scientific research.

Researchers and Anthropic itself have flagged that Mythos may be used to accelerate cyberattacks, aid biological or chemical weapons research, and give foreign actors a dangerous new tool. The model is being released to a limited group of government and other approved users, not to the public.

Read the original article on Business Insider
Big Tech AI Cybersecurity Policy

Intelligence Exchange

0

Log in to participate in the exchange.

Sign In

Syncing Discussions...

Finding Related Intelligence...
Anthropic: 'We made the wrong tradeoff' in new model guardrails | TechCulture