Anthropic released its latest model, Fable, on Tuesday, positioning it as a public and limited version of its cybersecurity model, Mythos. However, cybersecurity researchers and professionals have expressed dissatisfaction with the model's restrictive guardrails, which block a range of tasks, even those not directly related to cybersecurity. Valentina 'Chompie' Palmiotti, a security researcher at IBM X-Force, noted that Fable rejects any request that could be tangentially cyber-related, including reading a blog post. When a prompt triggers its guardrails, Fable pauses the chat and states that its 'safety measures flagged this message for cybersecurity or biology topics.'

The guardrails were implemented to mitigate the risk of Fable being used to develop malware or compromise software, a concern that has persisted within Anthropic. Restrictions on biology-related topics stem from similar concerns about developing biological weapons. When Mythos was first released in April, it was limited to a select group of companies and organizations through Project Glasswing. Last week, Anthropic expanded access to Mythos to hundreds of organizations in 15 countries. Despite these efforts, many cybersecurity experts remain frustrated with the arbitrary nature of the restrictions.

Anthropic requires cybersecurity professionals to apply for the Cyber Verification Program to access fewer limitations on using Claude for cybersecurity work. OpenAI has a similar program called Trusted Access for Cyber. Fable is programmed to fall back to Claude Opus 4.8 if it hits a guardrail, which appears to be keyword-based, triggering the guardrails with any term in the lexical field of 'cybersecurity.'

Source: techcrunch