Anthropic has shared internal data showing that Claude now writes more than 80 percent of the company's production code, a major shift in how Anthropic builds its own software. The company also noted that Claude is improving at research tasks and is closing in on human-level judgment in certain areas. Anthropic is warning about the risks of fully autonomous AI self-improvement and pushing for a verifiable, global development pause. A unilateral halt by any single lab wouldn't be enough, the company argues. Anthropic is sharing internal data for the first time that shows how much Claude is allegedly speeding up its own AI development. The core message is that recursive self-improvement—an AI system that fully autonomously designs its own successor—hasn't been achieved yet, but it "could come sooner than most institutions are prepared for." Anthropic admits that lines of code are an imperfect metric. The eightfold increase "is almost certainly an overstatement of the true productivity gain." In an internal survey from March 2026 with 130 employees, the median estimate pegged the output boost from Mythos Preview at 4x. Anthropic itself thinks the real number is a bit lower and points to recent METR research showing that developers tend to overestimate AI productivity gains. One employee is quoted saying, "it's now been ~5 months since I last wrote any code myself." Internal data shows a steep rise in code volume: With Claude's help, Anthropic developers in Q2 2026 are producing an average of eight times as much code as before the introduction of coding agents.

According to Anthropic, engineers in Q2 2026 are shipping an average of eight times as much code per day as they did in 2024. More than 80 percent of the code going into the production codebase comes from Claude. Before Claude Code launched in February 2025, that share was still in the low single digits. Leadership estimates the total share—including scripts and experimental code—at more than 90 percent. An automated Claude reviewer would have caught about a third of the bugs behind past incidents on claude.ai before they hit production, according to a retrospective analysis. In another example, Claude delivered more than 800 fixes in April 2026 that cut a class of API errors by a factor of 1,000. A human would have needed four years for that work, according to the engineer in charge.

Rising success rates on complex tasks: Anthropic's internal data shows how Claude's coding abilities have improved over time. The model has seen a sharp jump in its solve rate on open-ended problems since early 2026. According to Anthropic, the duration of tasks that AI systems can reliably handle on their own is now doubling roughly every four months, down from seven. In March 2024, Claude Opus 3 could manage tasks in the four-minute range. A year later, Claude Sonnet 3.7 handled an hour and a half. Claude Opus 4.6 now tackles 12-hour tasks. METR found that Claude Mythos Preview could work for "at least" 16 hours and was "at the upper end of what [METR] can measure without new tasks." If the trend holds, day-long tasks could be within reach this year, with week-long tasks following in 2027, Anthropic says.

Source: thedecoder