Model Release

Fable 5 AI Agents Now Complete 16 Percent of Freelance Jobs at Pro Quality

AI agents can now complete 16 percent of freelance jobs at professional quality, up from 2.5 percent eight months ago, according to the Remote Labor Index.

Close-up of a futuristic humanoid robot under dramatic lighting in dark ambiance.

Photo: Pavel Danilyuk / Pexels

AI agents have significantly improved their ability to complete freelance projects at a professional quality level. The Remote Labor Index, which measures how often AI agents can finish commercially valuable freelance tasks, shows that the top automation rate has more than quadrupled in eight months. Fable 5, a model developed by Scale Labs, now achieves an automation rate of 16.1 percent, the highest score ever recorded. This marks a substantial increase from the previous benchmark of 2.5 percent. The index evaluates projects across various fields, including graphic design, video production, and data analysis, with results from 240 projects totaling $144,000 in value.

The Remote Labor Index was created in collaboration with Scale Labs and involves human evaluators at the Center for AI Safety who score AI-generated work against a gold standard set by professionals. Fable 5's performance, however, is based on only 218 of the 240 projects due to U.S. government restrictions on model access. Even in the worst-case scenario, where Fable 5 failed all missing projects, its automation rate would still be 14.6 percent, outperforming all other models. The benchmark highlights the rapid advancement in AI automation, with Fable 5 surpassing previous leaders like Opus 4.8 and GPT-5.5.

Despite these gains, human evaluators remain essential for accurate assessment. AI judges consistently overrate the performance of new models, with scores that are up to three times higher than human evaluations. The Center for AI Safety explains that fair judgment requires using professional software tools and understanding the nuances of client expectations. The evaluation environment includes a virtual Linux machine with over 30 professional applications, such as Blender and GIMP, and allows AI agents to operate graphical programs directly. This setup ensures that models are tested in realistic conditions, reflecting the actual challenges of professional work.

Source: thedecoder

Key points

AI agents can now complete 16 percent of freelance jobs at professional quality, up from 2.5 percent eight months ago.
The Remote Labor Index measures how often AI agents finish commercially valuable freelance projects at a quality level a paying client would accept.
Fable 5 achieves an automation rate of 16.1 percent, the highest score ever recorded, surpassing Opus 4.8's 8.3 percent.
The Remote Labor Index includes 240 projects worth a combined $144,000, sourced from 358 verified freelancers.
Human evaluators at the Center for AI Safety score AI-generated work against a gold standard created by a paid professional.
AI judges overrate the performance of new models, with scores up to three times higher than human evaluations.

Source: The Decoder Read the original →

WRITTEN BY

Alex Lindgren

LLMs & Frontier Models

Alex covers the large language models and their impact on society.