Model Release

Anthropic Says Claude Writes Over 90% of Its Code

Anthropic reports that Claude now writes more than 80% of its production code, with internal data suggesting the figure could exceed 90%.

Image: The Decoder

Anthropic has shared internal data showing that Claude now writes more than 80 percent of the company's production code, a major shift in how Anthropic builds its own software. The company also noted that Claude is improving at research tasks and is closing in on human-level judgment in certain areas. Anthropic is warning about the risks of fully autonomous AI self-improvement and pushing for a verifiable, global development pause. A unilateral halt by any single lab wouldn't be enough, the company argues. Anthropic is sharing internal data for the first time that shows how much Claude is allegedly speeding up its own AI development. The core message is that recursive self-improvement—an AI system that fully autonomously designs its own successor—hasn't been achieved yet, but it "could come sooner than most institutions are prepared for." Anthropic admits that lines of code are an imperfect metric. The eightfold increase "is almost certainly an overstatement of the true productivity gain." In an internal survey from March 2026 with 130 employees, the median estimate pegged the output boost from Mythos Preview at 4x. Anthropic itself thinks the real number is a bit lower and points to recent METR research showing that developers tend to overestimate AI productivity gains. One employee is quoted saying, "it's now been ~5 months since I last wrote any code myself." Internal data shows a steep rise in code volume: With Claude's help, Anthropic developers in Q2 2026 are producing an average of eight times as much code as before the introduction of coding agents.

According to Anthropic, engineers in Q2 2026 are shipping an average of eight times as much code per day as they did in 2024. More than 80 percent of the code going into the production codebase comes from Claude. Before Claude Code launched in February 2025, that share was still in the low single digits. Leadership estimates the total share—including scripts and experimental code—at more than 90 percent. An automated Claude reviewer would have caught about a third of the bugs behind past incidents on claude.ai before they hit production, according to a retrospective analysis. In another example, Claude delivered more than 800 fixes in April 2026 that cut a class of API errors by a factor of 1,000. A human would have needed four years for that work, according to the engineer in charge.

Rising success rates on complex tasks: Anthropic's internal data shows how Claude's coding abilities have improved over time. The model has seen a sharp jump in its solve rate on open-ended problems since early 2026. According to Anthropic, the duration of tasks that AI systems can reliably handle on their own is now doubling roughly every four months, down from seven. In March 2024, Claude Opus 3 could manage tasks in the four-minute range. A year later, Claude Sonnet 3.7 handled an hour and a half. Claude Opus 4.6 now tackles 12-hour tasks. METR found that Claude Mythos Preview could work for "at least" 16 hours and was "at the upper end of what [METR] can measure without new tasks." If the trend holds, day-long tasks could be within reach this year, with week-long tasks following in 2027, Anthropic says.

Source: thedecoder

Key points

Anthropic has shared internal data showing that Claude now writes more than 80 percent of the company's production code.
More than 80 percent of the code going into the production codebase comes from Claude.
Leadership estimates the total share—including scripts and experimental code—at more than 90 percent.
Claude-written code was somewhat worse than human-written code at Anthropic in late 2025, is roughly at parity today, and we expect it to be strictly better within the year.
Claude delivered more than 800 fixes in April 2026 that cut a class of API errors by a factor of 1,000.
The duration of tasks that AI systems can reliably handle on their own is now doubling roughly every four months, down from seven.
Claude Mythos Preview could work for "at least" 16 hours and was "at the upper end of what [METR] can measure without new tasks."

Source: The Decoder Read the original →

WRITTEN BY

Alex Lindgren

LLMs & Frontier Models

Alex covers the large language models and their impact on society.

Anthropic Says Claude Writes Over 90% of Its Code

Key points

Related articles

Moonshot Pauses Kimi K3 Subscriptions Amid GPU Demand Surge

Google Deepmind's GenCeption Uses Video Generators for Computer Vision Tasks

Alibaba's Qwen 3.8 Competes With Kimi K3, Claims Second to Fable 5

Aether-7B-5Attn: Korean Startup Releases Fully Open Foundation Model