Safety

Meta Employees Warn AI Moderation Rollout Is Too Fast

Meta has replaced about half of human moderation tasks with AI models in 2025, planning to reach 90% for some content by year-end, according to the Financial Times.

Side view of crop anonymous male cyber thief accessing information on desktop computer screens at dusk

Photo: Anete Lusina / Pexels

Meta has already replaced roughly half of all human moderation requests with large language models in 2025 and plans to push that share above 90 percent for some content types by the end of the year. The shift is expected to save the company billions annually, according to the Financial Times. Meta disputes the cost argument and points to quality instead, saying that since March, tests show its language models make 13 percent fewer errors than humans when enforcing content policies while catching 10 percent more actual violations. Unlike traditional ML classifiers that struggle with satire or evolving language, the language models are supposed to better grasp nuance and cover more languages.

Employees paint a different picture. One insider says the models still remove or shadow-ban harmless content, and there isn't enough oversight for such a rapid rollout. The transition is already leading to layoffs, especially among external contractors. There's also a model swap happening behind the scenes, the Financial Times reports. Meta had been using Google's Gemini for moderation and support but recently told staff to switch to its own new foundation model called Muse Spark. The models are trained on past decisions made by human reviewers.

Meta said the models make 13 percent fewer errors than humans when enforcing content policies and catch 10 percent more actual violations since March. The company disputes the cost argument and emphasizes quality over expense. Employees, however, report that the models still remove or shadow-ban harmless content and that there isn't enough oversight for the rapid rollout.

Source: thedecoder

Key points

Meta has replaced roughly half of all human moderation requests with large language models in 2025.
Meta plans to push that share above 90 percent for some content types by the end of the year.
The shift is expected to save the company billions annually, according to the Financial Times.
Meta disputes the cost argument and points to quality instead, saying that since March, tests show its language models make 13 percent fewer errors than humans when enforcing content policies.
Meta said the models make 13 percent fewer errors than humans when enforcing content policies and catch 10 percent more actual violations since March.
Employees report that the models still remove or shadow-ban harmless content and that there isn't enough oversight for the rapid rollout.

Source: The Decoder Read the original →

WRITTEN BY

Nadia Rahman

AI Safety, Alignment & Policy

Nadia follows AI safety, alignment, regulation, and the policy debates shaping the field.