Improving Natural Language Inference in Arabic using Transformer Models and Linguistically Informed Pre-Training

This paper addresses the classification of Arabic text data in the field of Natural Language Processing (NLP), with a particular focus on Natural Language Inference (NLI) and Contradiction Detection (CD). Arabic is considered a resource-poor language, meaning that there are few data sets available, which leads to limited availability of NLP methods. To overcome this limitation, we create a dedicated data set from publicly available resources. Subsequently, transformer-based machine learning models are being trained and evaluated. We find that a language-specific model (AraBERT) performs competitively with state-of-the-art multilingual approaches, when we apply linguistically informed pre-training methods such as Named Entity Recognition (NER). To our knowledge, this is the first large-scale evaluation for this task in Arabic, as well as the first application of multi-task pre-training in this context.

  • Published in:
    2023 IEEE Symposium Series on Computational Intelligence (SSCI)
  • Type:
    Inproceedings
  • Authors:
    Majd Saad Al Deen, Mohammad; Pielka, Maren; Hees, Jörn; Soulef Abdou, Bouthaina; Sifa, Rafet
  • Year:
    2023
  • Source:
    https://ieeexplore.ieee.org/document/10371891

Citation information

Majd Saad Al Deen, Mohammad; Pielka, Maren; Hees, Jörn; Soulef Abdou, Bouthaina; Sifa, Rafet: Improving Natural Language Inference in Arabic using Transformer Models and Linguistically Informed Pre-Training, 2023 IEEE Symposium Series on Computational Intelligence (SSCI), 2023, https://ieeexplore.ieee.org/document/10371891, MajdSaadAlDeen.etal.2023a,

Associated Lamarr Researchers

lamarr institute person Pielka Maren - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI)

Maren Pielka

Autorin to the profile
Prof. Dr. Rafet Sifa

Prof. Dr. Rafet Sifa

Principal Investigator Hybrid ML to the profile