Other-ai
Anthropic Releases Claude Opus 4.8, Tops GPT-5.5 in Most Benchmarks
Anthropic's Claude Opus 4.8 outperforms OpenAI's GPT-5.5 in most benchmarks, scoring 69.2% on agentic coding tests, according to the company.
"Tokenizing Nature: the Internet of Forests" by ₡ґǘșϯγ Ɗᶏ Ⱪᶅṏⱳդ is marked with CC0 1.0. To view the terms, visit https://creativecommons.org/publicdomain/zero/1.0/.
Anthropic has released Claude Opus 4.8, a new AI language model that the company claims outperforms competitors like OpenAI's GPT-5.5 across most benchmarks, while also communicating its own uncertainties better. The model scores 69.2% on agentic coding (SWE-Bench Pro), up from 64.3% for Opus 4.7 and 58.6% for GPT-5.5. For multidisciplinary reasoning (Humanity's Last Exam), Opus 4.8 scores 49.8% without tools and 57.9% with tools, the highest marks in the field. Anthropic also introduced dynamic workflows that allow the model to schedule tasks and launch hundreds of parallel subagents, along with a new control that lets users determine how much effort the AI should put into generating a response. API pricing remains unchanged from its predecessor, Opus 4.7, at $5 per million input tokens and $25 per million output tokens. Fast Mode, which runs Opus 4.8 at 2.5x speed, now costs $10 per million input tokens and $50 per million output tokens. Anthropic says the model's improved honesty is one of its most noticeable upgrades, with early testers reporting it is more likely to flag uncertainties about its work and less likely to make unsupported claims. The company also noted that the model sets new highs on prosocial traits like supporting user autonomy, with deception attempts and other unaligned behavior at Claude Mythos levels. *Source: [thedecoder](https://the-decoder.com/anthropic-ships-claude-opus-4-8-as-a-modest-but-tangible-improvement-that-tops-gpt-5-5-in-most-benchmarks/)*
Key points
- Anthropic has released Claude Opus 4.8, a new AI language model that the company claims outperforms competitors like OpenAI's GPT-5.5 across most benchmarks, while also communicating its own uncertainties better.
- Anthropic's latest flagship model, Claude Opus 4.8, leads most benchmarks and is designed to be more upfront about its own mistakes.
- Anthropic says Opus 4.8 beats both its predecessor and OpenAI's GPT-5.5 and Google's Gemini 3.1 Pro across most tested categories.
- On agentic coding (SWE-Bench Pro), the model hits 69.2 percent, up from 64.3 percent for Opus 4.7 and 58.6 percent for GPT-5.5.
- For multidisciplinary reasoning (Humanity's Last Exam), Opus 4.8 scores 49.8 percent without tools and 57.9 percent with tools, the highest marks in the field.
- Anthropic also introduced dynamic workflows that allow the model to schedule tasks and launch hundreds of parallel subagents.
- API pricing remains unchanged from its predecessor, Opus 4.7, at $5 per million input tokens and $25 per million output tokens.