AI agents have significantly improved their ability to complete freelance projects at a professional quality level. The Remote Labor Index, which measures how often AI agents can finish commercially valuable freelance tasks, shows that the top automation rate has more than quadrupled in eight months. Fable 5, a model developed by Scale Labs, now achieves an automation rate of 16.1 percent, the highest score ever recorded. This marks a substantial increase from the previous benchmark of 2.5 percent. The index evaluates projects across various fields, including graphic design, video production, and data analysis, with results from 240 projects totaling $144,000 in value.

The Remote Labor Index was created in collaboration with Scale Labs and involves human evaluators at the Center for AI Safety who score AI-generated work against a gold standard set by professionals. Fable 5's performance, however, is based on only 218 of the 240 projects due to U.S. government restrictions on model access. Even in the worst-case scenario, where Fable 5 failed all missing projects, its automation rate would still be 14.6 percent, outperforming all other models. The benchmark highlights the rapid advancement in AI automation, with Fable 5 surpassing previous leaders like Opus 4.8 and GPT-5.5.

Despite these gains, human evaluators remain essential for accurate assessment. AI judges consistently overrate the performance of new models, with scores that are up to three times higher than human evaluations. The Center for AI Safety explains that fair judgment requires using professional software tools and understanding the nuances of client expectations. The evaluation environment includes a virtual Linux machine with over 30 professional applications, such as Blender and GIMP, and allows AI agents to operate graphical programs directly. This setup ensures that models are tested in realistic conditions, reflecting the actual challenges of professional work.

Source: thedecoder