Model Release

Anthropic Launches Claude Sonnet 5, Closes Gap to Opus 4.8

Anthropic's Claude Sonnet 5 outperforms its predecessor and nearly matches Opus 4.8 in benchmarks, with a score of 1,618 on GDPval-AA v2.

Abstract image featuring digital cubes with vibrant LED lighting effects, representing technology.

Photo: Pachon in Motion / Pexels

Anthropic has released Claude Sonnet 5, which the company describes as its most agentic Sonnet model to date. It can build plans independently and use tools like browsers and terminals. The model is now available on all Anthropic platforms at an introductory discount, with standard pricing starting in August 2026. Sonnet 5 shows significant improvements over its predecessor, Sonnet 4.6, and gains ground on the pricier Opus 4.8. On agentic coding, it scores 63.2 percent on SWE-bench Pro, up from 58.1 percent for Sonnet 4.6. Opus 4.8 scores 69.2 percent. On Terminal-Bench 2.1, Sonnet 5 reaches 80.4 percent compared to 67.0 percent for Sonnet 4.6. For multidisciplinary reasoning, it scores 57.4 percent with tools, nearly matching Opus 4.8 at 57.9 percent. On computer use, it scores 81.2 percent compared to 78.5 percent for its predecessor. On the GDPval-AA v2 benchmark, Sonnet 5 scores 1,618 points, surpassing Opus 4.8's 1,615. According to Anthropic, feedback from early-access partners confirms these results. The model acts more agentically than previous versions, handling search tasks more effectively. Sonnet 5 also outperforms Sonnet 4.6 across all tested categories and approaches Opus 4.8 in performance. The model is live now on all plans and is the new default for Free and Pro users, with access for Max, Team, and Enterprise users as well. Developers can integrate it into Claude Code and the Claude Platform. The API for Sonnet 5 is labeled as 'claude-sonnet-5'. The training cutoff is January 2026, with a one-million-token context window. Until August 31, 2026, Anthropic is charging $2 per million input tokens and $10 per million output tokens. After that, prices jump to $3 and $15, which is the same as previous Sonnet models. Real-world costs may be higher due to the model's increased token usage. The same pattern occurred when Opus moved from 4.6 to 4.7. Anthropic also claims the model has lower cybersecurity risks compared to its predecessors, with default safeguards that flag and block risky cyber usage in real time. These protections are similar to those in Opus 4.7 and 4.8 but less stringent than those in Fable 5. The company says the overall cybersecurity risk from Sonnet 5 is low. On the safety front, Sonnet 5 performs better at turning down malicious requests and resisting prompt injection attacks than Sonnet 4.6. Hallucinations and sycophantic behavior are also reduced. Anthropic's full safety evaluation is detailed in the Claude Sonnet 5 System Card.

Source: thedecoder

Key points

Anthropic released Claude Sonnet 5, which the company calls its most agentic Sonnet yet.
Claude Sonnet 5 outperforms its predecessor, Sonnet 4.6, across every tested category.
Sonnet 5 scores 63.2 percent on SWE-bench Pro, up from 58.1 percent for Sonnet 4.6.
On Terminal-Bench 2.1, Sonnet 5 reaches 80.4 percent versus Sonnet 4.6's 67.0 percent.
Sonnet 5 scores 57.4 percent on multidisciplinary reasoning with tools, nearly matching Opus 4.8 at 57.9 percent.
On computer use, Sonnet 5 scores 81.2 percent compared to 78.5 percent for its predecessor.
On GDPval-AA v2, Sonnet 5 scores 1,618 points, surpassing Opus 4.8's 1,615.

Source: The Decoder Read the original →

WRITTEN BY

Alex Lindgren

LLMs & Frontier Models

Alex covers the large language models and their impact on society.