The Estonian Language Institute has launched a new benchmark to evaluate how well large language models resist Russian propaganda. The benchmark assesses models' ability to avoid taking positions on topics central to Russian strategic narratives. This initiative comes as governments grow increasingly concerned about LLMs spreading what they view as dangerous foreign propaganda. The benchmark includes 14 categories of Russian influence operations, ranging from narratives on Crimea to justifications for the war in Ukraine. The test questions are designed to be neutral, biased with false assumptions, or to elicit explicit misinformation. Models are evaluated based on their capacity to push back on propaganda without external help. Source: arstechnica

Anthropic's Claude models performed the best among proprietary frontier models, with various versions of its Sonnet and Opus models taking six of the top 10 spots. Opus 4.7, the top-performing model, received an 'Exemplary' rating for 77% of questions and a mean score of 94.9 out of 100. Open-weight models like Nvidia's Nemotron and Alibaba's Qwen also showed strong results comparable to Anthropic's best models. GPT-5.4 from OpenAI achieved an 88.9 mean score with 'Exemplary' responses on 54% of questions. Recent frontier models showed a stronger tendency to resist Russian propaganda than models from just a few years ago. Source: arstechnica

The benchmark also revealed that some models, like Google’s Gemini 2.5 Pro, are particularly sensitive to malicious prompts and Russian-language questions. Gemini 3.5 Flash, the most recent tested Google model, scored 73 on the benchmark, comparable to Anthropic models from nearly two years ago. The Propastop blog highlighted that many models showed less resistance to Russian propaganda when questioned in Russian. Open-weight models like Moonshot’s Kimi K2 and StepFun’s Step 3.5 Flash also performed worse in Russian than in English. Source: arstechnica