{MultiProp} Framework: Ensemble Models for Enhanced Cross-Lingual Propaganda Detection in Social Media and News using Data Augmentation, Text Segmentation, and Meta-Learning

Propaganda, a pervasive tool for influenc- ing public opinion, demands robust auto- mated detection systems, particularly for under- resourced languages. Current efforts largely focus on well-resourced languages like English, leaving significant gaps in languages such as Arabic. This research addresses these gaps by introducing {MultiProp} Framework, a cross- lingual meta-learning framework designed to enhance propaganda detection across multiple languages, including Arabic, German, Italian, French and English. We constructed a mul- tilingual dataset using data translation tech- niques, beginning with Arabic data from {PTC} and {WANLP} shared tasks, and expanded it with translations into German Italian and French, further enriched by the {SemEval}23 dataset. Our proposed framework encompasses three distinct models: {MultiProp}-Baseline, which combines ensembles of pre-trained models such as {GPT}-2, {mBART}, and {XLM}-{RoBERTa}; {MultiProp}-{ML}, designed to handle languages with minimal or no training data by utiliz- ing advanced meta-learning techniques; and {MultiProp}-Chunk, which overcomes the chal- lenges of processing longer texts that exceed the token limits of pre-trained models. To- gether, they deliver superior performance com- pared to state-of-the-art methods, representing a significant advancement in the field of cross- lingual propaganda detection.

Published in:
Proceedings of the 1st Workshop on {NLP} for Languages Using Arabic Script
Type:
Inproceedings
Authors:
Aldabbas, Farizeh; Ashraf, Shaina; Sifa, Rafet; Flek, Lucie
Year:
2025
Source:
https://aclanthology.org/2025.abjadnlp-1.2/

Citation information

Aldabbas, Farizeh; Ashraf, Shaina; Sifa, Rafet; Flek, Lucie: {MultiProp} Framework: Ensemble Models for Enhanced Cross-Lingual Propaganda Detection in Social Media and News using Data Augmentation, Text Segmentation, and Meta-Learning, Proceedings of the 1st Workshop on {NLP} for Languages Using Arabic Script, 2025, 7--22, January, Association for Computational Linguistics, https://aclanthology.org/2025.abjadnlp-1.2/, Aldabbas.etal.2025a,

Open BibTeX citation

{MultiProp} Framework: Ensemble Models for Enhanced Cross-Lingual Propaganda Detection in Social Media and News using Data Augmentation, Text Segmentation, and Meta-Learning

Citation information

Associated Lamarr Researchers

Prof. Dr. Rafet Sifa

Prof. Dr. Lucie Flek