Robustness Evaluation of the German Extractive Question Answering Task

To ensure reliable performance of Question Answering ({QA}) systems, evaluation of robustness is crucial. Common evaluation benchmarks commonly only include performance metrics, such as Exact Match ({EM}) and the F1 score. However, these benchmarks overlook critical factors for the deployment of {QA} systems. This oversight can result in systems vulnerable to minor perturbations in the input such as typographical errors. While several methods have been proposed to test the robustness of {QA} models, there has been minimal exploration of these approaches for languages other than English. This study focuses on the robustness evaluation of German language {QA} models, extending methodologies previously applied primarily to English. The objective is to nurture the development of robust models by defining an evaluation method specifically tailored to the German language. We assess the applicability of perturbations used in English {QA} models for German and perform a comprehensive experimental evaluation with eight models. The results show that all models are vulnerable to character-level perturbations. Additionally, the comparison of monolingual and multilingual models suggest that the former are less affected by character and word-level perturbations.

Published in:
Proceedings of the 31st International Conference on Computational Linguistics
Type:
Inproceedings
Authors:
Satheesh, Shalaka; Beckh, Katharina; Klug, Katrin; Allende-Cid, Héctor; Houben, Sebastian; Hassan, Teena
Year:
2025
Source:
https://aclanthology.org/2025.coling-main.121/

Citation information

Satheesh, Shalaka; Beckh, Katharina; Klug, Katrin; Allende-Cid, Héctor; Houben, Sebastian; Hassan, Teena: Robustness Evaluation of the German Extractive Question Answering Task, Proceedings of the 31st International Conference on Computational Linguistics, 2025, 1785--1801, January, Association for Computational Linguistics, https://aclanthology.org/2025.coling-main.121/, Satheesh.etal.2025a,

Open BibTeX citation

Robustness Evaluation of the German Extractive Question Answering Task

Citation information

Associated Lamarr Researchers

Katharina Beckh

Dr. Sebastian Houben