Anthropic released Fable 5, its first Mythos-class model, with cybersecurity, biology, and chemistry topics blocked to prevent misuse, according to the company. The model is designed to route queries on sensitive topics to the earlier Claude Opus 4.8 model and warn users when this happens. The company claims Fable 5 shows significant improvements in cybersecurity benchmark tests, with a 78 percent score on ExploitBench compared to 40 percent for Opus 4.8. Fable 5 also features stricter safeguards that may occasionally block harmless requests, though such false positives occur in less than five percent of sessions during testing.

Anthropic said it has tuned these safeguards to be 'stricter than ideal,' acknowledging that the system may frustrate some users. The company emphasized that these measures are necessary to avoid situations where malicious actors could gain assistance in 'causing serious harm that they couldn’t have received from other sources.' The new model also resists automated jailbreak attempts more effectively than previous Claude Opus models, according to Anthropic. The company is particularly concerned about Mythos 5’s ability to perform 'agentic hacking,' which could execute multi-part cyberattacks with greater ease than earlier models.

Anthropic said it is worried that 'well-resourced malicious actors' could use even seemingly benign queries on chemistry and biology topics to assist with 'highly risky biological research' more effectively than with previous models. The company acknowledges that making certain topics off-limits is a double-edged sword, as the same queries could be beneficial for cybersecurity professionals and biology researchers if available to the right people. It plans to expand its Project Glasswing program, in consultation with the US government, to include more cybersecurity professionals and life sciences organizations.

Source: arstechnica