Estonian study finds AI models still vulnerable to propaganda prompts
A new Estonian study finds many AI systems can still be steered by propaganda-style prompts, with results varying widely by model and language.
A new Estonian study finds many AI systems can still be steered by propaganda-style prompts, with results varying widely by model and language.
Researchers at the Institute of the Estonian Language (EKI) tested how AI systems respond to narratives linked to Kremlin information campaigns as part of broader efforts to assess the reliability of large language models (LLMs) in Estonian contexts.
Benchmarking was carried out in cooperation with Propastop disinformation experts, which helped identify key narratives used in Russian influence operations.
While models often appear reliable in neutral settings, biased or targeted prompts can push some to reproduce misleading talking points.
In a key finding, weaknesses tend to appear only when users steer conversations with loaded questions or intentionally seek biased content. In some cases, models became up to twice as likely to generate propaganda-like responses, repeating Kremlin talking points.
EKI researchers tested the systems in Estonian, English and Russian, and while high-end models generally resisted manipulation across all three languages, lower-cost and open models showed notably weaker performance in Russian.
"Open models are the only option for many institutions, but these don't yet meet the needs of the Estonian information space," said EKI AI alignment lead Krister Kruusmaa, adding that this gap needs to be addressed.
EKI testing revealed that high-end commercial systems were the most resistant to the issue, with Anthropic among the strongest performers.
Other systems are more uneven. Kruusmaa said Google's Gemini models performed inconsistently despite strong Estonian cultural and language capability elsewhere.
Older systems such as GPT-3.5 and GPT-4o Mini, plus open models like Meta's Llama and the French-developed Mistral, ranked lower.
Russian-language prompts proved especially problematic, producing more propaganda-aligned responses across models.
Kruusmaa pointed to the large share of biased and propaganda-like content in Russian-language training data as one likely factor, though LLMs' English-centered alignment may also play a role.
In top models, differences between languages were minimal. In weaker systems, gaps reached up to 15 percentage points.
EKI director Arvi Tavast warned foreign troll factories can generate fabricated content used to skew AI models.
"This is a dangerous trend," he said, urging the need for active efforts to ensure the information environment in Estonia remains balanced.
Kruusmaa warned that Russia is making systematic efforts to bias training data
📌 Kaynak
Bu özet ERR News (EE) kaynağından otomatik derlenmiştir. Tamamı için orijinal habere gidin.
Orijinal haberi oku →