Exploring Curriculum Learning for Languages: Lessons from Regular Language Tasks
Despite its intuitive appeal, the effectiveness of data-level curriculum learning ({CL}) remains debated, mainly due to the absence of unambiguous notions of sample difficulty in real-world tasks. As a step towards a better understanding of the effective use of different curriculum strategies in natural language learning, we study {CL} in the context of regular languages, where both ground truth and sample difficulty can be precisely defined using deterministic finite automata. We consider two natural measures of difficulty: a data-driven metric based on input length and a task-specific metric derived from the automaton’s structure. Training {RNNs} and {LSTMs} across ten regular language classification tasks, we find that {CL} is not just beneficial but, in some cases, essential for generalisation. Surprisingly, straightforward data-driven curricula outperform more complex task-specific strategies, with the most successful approaches oversampling the shorter lengths early in training.
- Published in:
Discovery Science - Type:
Inproceedings - Year:
2025
Citation information
: Exploring Curriculum Learning for Languages: Lessons from Regular Language Tasks, Discovery Science, 2025, 571--586, Springer Nature Switzerland, Toborek.etal.2025a,
@Inproceedings{Toborek.etal.2025a,
author={Toborek, Vanessa; Seiffarth, Florian; Müller, Sebastian; Horváth, Tamás; Bauckhage, Christian},
title={Exploring Curriculum Learning for Languages: Lessons from Regular Language Tasks},
booktitle={Discovery Science},
pages={571--586},
publisher={Springer Nature Switzerland},
year={2025},
abstract={Despite its intuitive appeal, the effectiveness of data-level curriculum learning ({CL}) remains debated, mainly due to the absence of unambiguous notions of sample difficulty in real-world tasks. As a step towards a better understanding of the effective use of different curriculum strategies in natural language learning, we study {CL} in the context of regular languages, where both ground truth...}}