Meta has already replaced roughly half of all human moderation requests with large language models in 2025 and plans to push that share above 90 percent for some content types by the end of the year. The shift is expected to save the company billions annually, according to the Financial Times. Meta disputes the cost argument and points to quality instead, saying that since March, tests show its language models make 13 percent fewer errors than humans when enforcing content policies while catching 10 percent more actual violations. Unlike traditional ML classifiers that struggle with satire or evolving language, the language models are supposed to better grasp nuance and cover more languages.

Employees paint a different picture. One insider says the models still remove or shadow-ban harmless content, and there isn't enough oversight for such a rapid rollout. The transition is already leading to layoffs, especially among external contractors. There's also a model swap happening behind the scenes, the Financial Times reports. Meta had been using Google's Gemini for moderation and support but recently told staff to switch to its own new foundation model called Muse Spark. The models are trained on past decisions made by human reviewers.

Meta said the models make 13 percent fewer errors than humans when enforcing content policies and catch 10 percent more actual violations since March. The company disputes the cost argument and emphasizes quality over expense. Employees, however, report that the models still remove or shadow-ban harmless content and that there isn't enough oversight for the rapid rollout.

Source: thedecoder