Research

Estonian institute ranks LLMs on resistance to Russian propaganda

The Estonian Language Institute released a new benchmark ranking LLMs on their ability to resist Russian propaganda, with Anthropic's Claude models leading the pack.

Image: Ars Technica

The Estonian Language Institute has launched a new benchmark to evaluate how well large language models resist Russian propaganda. The benchmark assesses models' ability to avoid taking positions on topics central to Russian strategic narratives. This initiative comes as governments grow increasingly concerned about LLMs spreading what they view as dangerous foreign propaganda. The benchmark includes 14 categories of Russian influence operations, ranging from narratives on Crimea to justifications for the war in Ukraine. The test questions are designed to be neutral, biased with false assumptions, or to elicit explicit misinformation. Models are evaluated based on their capacity to push back on propaganda without external help. Source: arstechnica

Anthropic's Claude models performed the best among proprietary frontier models, with various versions of its Sonnet and Opus models taking six of the top 10 spots. Opus 4.7, the top-performing model, received an 'Exemplary' rating for 77% of questions and a mean score of 94.9 out of 100. Open-weight models like Nvidia's Nemotron and Alibaba's Qwen also showed strong results comparable to Anthropic's best models. GPT-5.4 from OpenAI achieved an 88.9 mean score with 'Exemplary' responses on 54% of questions. Recent frontier models showed a stronger tendency to resist Russian propaganda than models from just a few years ago. Source: arstechnica

The benchmark also revealed that some models, like Google’s Gemini 2.5 Pro, are particularly sensitive to malicious prompts and Russian-language questions. Gemini 3.5 Flash, the most recent tested Google model, scored 73 on the benchmark, comparable to Anthropic models from nearly two years ago. The Propastop blog highlighted that many models showed less resistance to Russian propaganda when questioned in Russian. Open-weight models like Moonshot’s Kimi K2 and StepFun’s Step 3.5 Flash also performed worse in Russian than in English. Source: arstechnica

Key points

The Estonian Language Institute released a new benchmark ranking LLMs on their ability to resist Russian propaganda.
Anthropic's Claude models performed the best among proprietary frontier models, with Opus 4.7 achieving a mean score of 94.9 out of 100.
GPT-5.4 from OpenAI achieved an 88.9 mean score with 'Exemplary' responses on 54% of questions.
Google’s Gemini 2.5 Pro is particularly sensitive to malicious prompts and Russian-language questions, scoring 82 on the benchmark.
The Propastop blog highlighted that many models showed less resistance to Russian propaganda when questioned in Russian.
Open-weight models like Moonshot’s Kimi K2 and StepFun’s Step 3.5 Flash performed worse in Russian than in English.

Source: Ars Technica Read the original →

WRITTEN BY

Maya Chen

AI Research & Breakthroughs

Maya breaks down the latest AI research papers, benchmarks, and technical breakthroughs into plain language.

Estonian institute ranks LLMs on resistance to Russian propaganda

Key points

Related articles

RadLE 2.0 Tests AI Chatbots' Confidence in Reading X-Rays

Epoch AI Study Shows AI Detectors Struggle With Style Imitation

Kimi K3 Outperforms Fable 5 in Frontend Code

Meta and Stanford Test AI with Baby-Like Learning