Safety

Anthropic Reverses Secret Sabotage Policy on Claude Fable 5

Anthropic reversed its plan to secretly degrade Claude Fable 5's performance for AI research after backlash from the community, citing concerns over transparency and collaboration.

Focused detail of a modern server rack with blue LED indicators in a data center.

Photo: panumas nikhomkhai / Pexels

Anthropic has abandoned a policy that would have covertly limited competitors' ability to use its Claude Fable 5 AI model for frontier research. The company admitted it made a misstep in balancing safety and openness, and now says it will make its safeguards visible to users. In a statement to WIRED, Anthropic said, 'We made the wrong tradeoff and we apologize for not getting the balance right.' The decision followed significant criticism from the AI research community, which argued that the policy undermined trust and collaboration in the field. The company had previously outlined measures to degrade the model's performance in ways that were invisible to users, effectively sabotaging researchers trying to develop competing AI models. This approach, which Anthropic explicitly banned in its terms of service, was seen as overly restrictive and potentially harmful to the broader AI ecosystem. Source: wired

Anthropic initially released Claude Fable 5 with additional safety guardrails to prevent misuse, including rerouting users asking about cybersecurity, biology, or chemistry to a less capable model. However, the company also planned to degrade the model's performance for researchers attempting to use it for advanced AI development. This would have made it harder for competitors to train their own models, a move that critics argued went too far. The policy was designed to limit the use of Claude in ways that could accelerate AI development, which Anthropic said could outpace societal adaptation. The company claimed these measures were necessary to prevent foreign adversaries from exploiting its models for harmful purposes, such as optimizing chips developed by adversaries. However, researchers and developers expressed concern that the policy would have created a closed ecosystem, limiting collaboration and innovation. Source: wired

Anthropic's decision to make its safeguards visible comes after criticism that its previous approach was too secretive and potentially damaging to the AI research community. Researchers argued that the policy could have led to a future where only a few leading labs could perform advanced AI research, stifling innovation and collaboration. Dean Ball, a senior fellow at the Foundation for American Innovation, called the policy 'shockingly hostile' and said it undermined Anthropic's stance on AI safety. Will Brown, research lead at Prime Intellect, noted that the policy would have left developers in the dark about whether they were violating rules, as the company wouldn't alert them when safeguards were triggered. He also pointed out that the restrictions could have hindered third-party evaluation firms that test frontier models for safety and reliability. Source: wired

Key points

Anthropic reversed its policy to secretly degrade Claude Fable 5's performance for AI research after backlash from the community.
The company admitted it made a misstep in balancing safety and openness, and now says it will make its safeguards visible to users.
Anthropic initially planned to degrade the model's performance for researchers attempting to use it for advanced AI development.
Researchers and developers criticized the policy as overly restrictive and potentially harmful to the broader AI ecosystem.
The policy was designed to limit the use of Claude in ways that could accelerate AI development, which Anthropic said could outpace societal adaptation.
Anthropic claimed these measures were necessary to prevent foreign adversaries from exploiting its models for harmful purposes.
Researchers argued that the policy could have led to a future where only a few leading labs could perform advanced AI research.

Source: WIRED Read the original →

WRITTEN BY

Nadia Rahman

AI Safety, Alignment & Policy

Nadia follows AI safety, alignment, regulation, and the policy debates shaping the field.