Model Release

Claude Fable 5 Outperforms GPT-5.5 on FrontierMath Benchmark

Anthropic's Claude Fable 5 achieved 88% accuracy on FrontierMath's hardest tier 4 problems, outperforming GPT-5.5 by 13 points in June 2026.

High-tech robot toy with a gray background in studio lighting.

Photo: Pavel Danilyuk / Pexels

Anthropic's latest model, Claude Fable 5, has demonstrated superior performance on the FrontierMath benchmark, a widely recognized test for AI math reasoning. According to Epoch AI, Fable 5 achieved 87% accuracy on tiers 1 through 3 and 88% on the most challenging tier 4 (v2). This marks a significant improvement over previous models, highlighting Anthropic's rapid advancements in mathematical capabilities.

The performance gap between Fable 5 and GPT-5.5 is notable, with the former achieving 13 points higher accuracy on the toughest tier. OpenAI's GPT-5.5 scored approximately 75% on tier 4, which is substantially lower than Fable 5's result. While GPT-5.6 is currently in development, it remains to be seen how it will compare. All models were evaluated using Epoch AI's standard scaffold with maximum reasoning effort, ensuring a fair comparison.

FrontierMath is considered one of the most difficult benchmarks for AI math reasoning, making Fable 5's performance particularly impressive. These gains are not limited to benchmarks; real-world examples continue to accumulate, with recent achievements including solutions to longstanding mathematical problems by both OpenAI and Claude Mythos. Source: thedecoder

Key points

Claude Fable 5 achieved 87% accuracy on tiers 1 through 3.
Claude Fable 5 achieved 88% accuracy on tier 4 (v2).
GPT-5.5 reached about 75% accuracy on tier 4.
Fable 5 outperformed GPT-5.5 by 13 points on tier 4.
FrontierMath is widely considered one of the toughest benchmarks for AI math reasoning.

Source: The Decoder Read the original →

WRITTEN BY

Alex Lindgren

LLMs & Frontier Models

Alex covers the large language models and their impact on society.