A New Aligned Simple German Corpus
“Leichte Sprache”, the German counterpart to Simple English, is a regulated language aiming to facilitate complex written language that would otherwise stay inaccessible to different groups of people. We present a new sentence-aligned monolingual corpus for Simple German — German. It contains multiple document-aligned sources which we have aligned using automatic sentence-alignment methods. We evaluate our alignments based on a manually labelled subset of aligned documents. The quality of our sentence alignments, as measured by the F1-score, surpasses previous work. We publish the dataset under CC BY-SA and the accompanying code under MIT license.
- Published in:
Annual Meeting of the Association for Computational Linguistics - Type:
Inproceedings - Authors:
Toborek, Vanessa; Busch, Moritz; Boßert, Malte; Bauckhage, Christian; Welke, Pascal - Year:
2023
Citation information
Toborek, Vanessa; Busch, Moritz; Boßert, Malte; Bauckhage, Christian; Welke, Pascal: A New Aligned Simple German Corpus, Annual Meeting of the Association for Computational Linguistics, 2023, https://aclanthology.org/2023.acl-long.638/, Toborek.etal.2023a,
@Inproceedings{Toborek.etal.2023a,
author={Toborek, Vanessa; Busch, Moritz; Boßert, Malte; Bauckhage, Christian; Welke, Pascal},
title={A New Aligned Simple German Corpus},
booktitle={Annual Meeting of the Association for Computational Linguistics},
url={https://aclanthology.org/2023.acl-long.638/},
year={2023},
abstract={``Leichte Sprache'', the German counterpart to Simple English, is a regulated language aiming to facilitate complex written language that would otherwise stay inaccessible to different groups of people. We present a new sentence-aligned monolingual corpus for Simple German -- German. It contains multiple document-aligned sources which we have aligned using automatic sentence-alignment methods....}}