Research

Anthropic Study Shows AI Can Build Exploits in Hours, Not Weeks

Anthropic's study reveals large language models can create exploits from security patches in hours, not weeks, with Mythos Preview achieving results in under six hours.

Two scientists wearing protective gear collaborating in a laboratory setting with modern equipment.

Photo: Edward Jenner / Pexels

Anthropic's security research team has demonstrated that large language models can now create working exploits from security patches in hours, not weeks. The study shows that a single operator can turn a month’s worth of patches into exploits in a single afternoon for a few thousand dollars, without specialized expertise. According to the researchers, this drastically reduces the time attackers need to exploit vulnerabilities after a patch is released. The findings challenge long-standing assumptions about how quickly attackers can reverse-engineer patches and exploit vulnerabilities.

The study tested six Claude models, including the unreleased Mythos Preview, on security patches for SpiderMonkey, Firefox's JavaScript engine. Mythos Preview successfully identified and exploited 14 out of 18 vulnerabilities in under six hours, with the first exploit ready within an hour of the patch going live. Opus 4.8 and Opus 4.6 performed less successfully, with Opus 4.8 finding 11 vulnerabilities and Opus 4.6 finding just two. In reliability tests, Mythos Preview reproduced seven out of 18 bugs on every attempt, while other models only matched this level of consistency for one vulnerability each.

The researchers also tested Mythos Preview on 21 Windows kernel vulnerabilities, all allowing privilege escalation. Mythos Preview found 18 of the 20 vulnerabilities in under six hours, with eight working exploits developed at a total cost of about $15,700. Opus 4.8 and Sonnet 4.6 scored lower, with Opus 4.8 finding 15 vulnerabilities and Sonnet 4.6 finding 13. Microsoft classified 14 of the 21 vulnerabilities as unlikely to be exploited, but Mythos Preview cracked 13 of those and even achieved full privilege escalation for one rated as unlikely. According to Anthropic, Microsoft's rating system is calibrated for human researchers, and the rise of AI models like Mythos Preview will require a reevaluation of these assessments.

Source: thedecoder

Key points

Anthropic's security team found large language models can create exploits from security patches in hours, not weeks.
Mythos Preview identified 14 out of 18 SpiderMonkey vulnerabilities in under six hours, with the first exploit ready within an hour.
Mythos Preview found 18 of 21 Windows kernel vulnerabilities in under six hours, with eight working exploits developed at a total cost of about $15,700.
Microsoft classified 14 of the 21 Windows vulnerabilities as unlikely to be exploited, but Mythos Preview cracked 13 of those.
Anthropic argues the term 'N-Day' is now misleading and 'N-Hour' better describes the new reality of exploit development.
Publicly available Claude models can also develop exploits when safety filters are turned off, though less successfully than Mythos Preview.

Source: The Decoder Read the original →

WRITTEN BY

Maya Chen

AI Research & Breakthroughs

Maya breaks down the latest AI research papers, benchmarks, and technical breakthroughs into plain language.