Other-ai

Anthropic Releases Claude Opus 4.8, Tops GPT-5.5 in Most Benchmarks

Anthropic's Claude Opus 4.8 outperforms OpenAI's GPT-5.5 in most benchmarks, scoring 69.2% on agentic coding tests, according to the company.

Image: The Decoder

Anthropic has released Claude Opus 4.8, a new AI language model that the company claims outperforms competitors like OpenAI's GPT-5.5 across most benchmarks, while also communicating its own uncertainties better. The model scores 69.2% on agentic coding (SWE-Bench Pro), up from 64.3% for Opus 4.7 and 58.6% for GPT-5.5. For multidisciplinary reasoning (Humanity's Last Exam), Opus 4.8 scores 49.8% without tools and 57.9% with tools, the highest marks in the field.

Anthropic also introduced dynamic workflows that allow the model to schedule tasks and launch hundreds of parallel subagents, along with a new control that lets users determine how much effort the AI should put into generating a response. API pricing remains unchanged from its predecessor, Opus 4.7, at $5 per million input tokens and $25 per million output tokens. Fast Mode, which runs Opus 4.8 at 2.5x speed, now costs $10 per million input tokens and $50 per million output tokens.

Anthropic says the model's improved honesty is one of its most noticeable upgrades, with early testers reporting it is more likely to flag uncertainties about its work and less likely to make unsupported claims. The company also noted that the model sets new highs on prosocial traits like supporting user autonomy, with deception attempts and other unaligned behavior at Claude Mythos levels.

Source: thedecoder

Key points

Anthropic has released Claude Opus 4.8, a new AI language model that the company claims outperforms competitors like OpenAI's GPT-5.5 across most benchmarks, while also communicating its own uncertainties better.
Anthropic's latest flagship model, Claude Opus 4.8, leads most benchmarks and is designed to be more upfront about its own mistakes.
Anthropic says Opus 4.8 beats both its predecessor and OpenAI's GPT-5.5 and Google's Gemini 3.1 Pro across most tested categories.
On agentic coding (SWE-Bench Pro), the model hits 69.2 percent, up from 64.3 percent for Opus 4.7 and 58.6 percent for GPT-5.5.
For multidisciplinary reasoning (Humanity's Last Exam), Opus 4.8 scores 49.8 percent without tools and 57.9 percent with tools, the highest marks in the field.
Anthropic also introduced dynamic workflows that allow the model to schedule tasks and launch hundreds of parallel subagents.
API pricing remains unchanged from its predecessor, Opus 4.7, at $5 per million input tokens and $25 per million output tokens.

Source: The Decoder Read the original →

WRITTEN BY

Priya Anand

Emerging AI & Applications

Priya covers emerging AI applications and the wider impact of AI across industries.

Anthropic Releases Claude Opus 4.8, Tops GPT-5.5 in Most Benchmarks

Key points

Related articles

Microsoft Patches Record 570 Security Vulnerabilities, Citing AI Use

AI Music Generator Suno Accused of Scraping YouTube Data

AI Model Designs DNA Origami for Any Shape

Vint Cerf Advises on Open-Internet AI Agent Standards