{MultiProp} Framework: Ensemble Models for Enhanced Cross-Lingual Propaganda Detection in Social Media and News using Data Augmentation, Text Segmentation, and Meta-Learning

Propaganda, a pervasive tool for influencing public opinion, demands robust automated detection systems, particularly for underresourced languages. Current efforts largely focus on well-resourced languages like English, leaving significant gaps in languages such as Arabic. This research addresses these gaps by introducing {MultiProp} Framework, a crosslingual meta-learning framework designed to enhance propaganda detection across multiple languages, including Arabic, German, Italian, French and English. We constructed a multilingual dataset using data translation techniques, beginning with Arabic data from {PTC} and {WANLP} shared tasks, and expanded it with translations into German Italian and French, further enriched by the {SemEval}23 dataset. Our proposed framework encompasses three distinct models: {MultiProp}-Baseline, which combines ensembles of pre-trained models such as {GPT}-2, {mBART}, and {XLM}-{RoBERTa}; {MultiProp}-{ML}, designed to handle languages with minimal or no training data by utilizing advanced meta-learning techniques; and {MultiProp}-Chunk, which overcomes the challenges of processing longer texts that exceed the token limits of pre-trained models. Together, they deliver superior performance compared to state-of-the-art methods, representing a significant advancement in the field of crosslingual propaganda detection.

Veröffentlicht in:
Proceedings of the 1st Workshop on {NLP} for Languages Using Arabic Script
Typ:
Inproceedings
Autoren:
Aldabbas, Farizeh; Ashraf, Shaina; Sifa, Rafet; Flek, Lucie
Jahr:
2025
Source:
https://aclanthology.org/2025.abjadnlp-1.2/

Informationen zur Zitierung

Aldabbas, Farizeh; Ashraf, Shaina; Sifa, Rafet; Flek, Lucie: {MultiProp} Framework: Ensemble Models for Enhanced Cross-Lingual Propaganda Detection in Social Media and News using Data Augmentation, Text Segmentation, and Meta-Learning, Proceedings of the 1st Workshop on {NLP} for Languages Using Arabic Script, 2025, 7--22, January, Association for Computational Linguistics, https://aclanthology.org/2025.abjadnlp-1.2/, Aldabbas.etal.2025a,

BibTeX-Zitat öffnen

{MultiProp} Framework: Ensemble Models for Enhanced Cross-Lingual Propaganda Detection in Social Media and News using Data Augmentation, Text Segmentation, and Meta-Learning

Informationen zur Zitierung

Assoziierte Lamarr-ForscherInnen

Prof. Dr. Rafet Sifa

Prof. Dr. Lucie Flek