Improving Low-Resource Dialect Classification Using Retrieval-based Voice Conversion
Deep learning models for dialect identification are often limited by the scarcity of dialectal data. To address this challenge, we propose to use Retrieval-based Voice Conversion (RVC) as an effective data augmentation method for a low-resource German dialect classification task. By converting audio samples to a uniform target speaker, RVC minimizes speaker-related variability, enabling models to focus on dialect-specific linguistic and phonetic features. Our experiments demonstrate that RVC enhances classification performance when utilized as a standalone augmentation method. Furthermore, combining RVC with other augmentation methods such as frequency masking and segment removal leads to additional performance gains, highlighting its potential for improving dialect classification in low-resource scenarios.
- Published in:
Proc. Interspeech 2025 - Type:
Inproceedings - Year:
2025 - Source:
https://www.isca-speech.org/archive/interspeech_2025/fischbach25_interspeech.html
Citation information
: Improving Low-Resource Dialect Classification Using Retrieval-based Voice Conversion, Proc. Interspeech 2025, 2025, 2780--2784, https://www.isca-speech.org/archive/interspeech_2025/fischbach25_interspeech.html, Fischbach.etal.2025b,
@Inproceedings{Fischbach.etal.2025b,
author={Fischbach, Lea; Karimi, Akbar; Kleen, Caroline; Lameli, Alfred; Flek, Lucie},
title={Improving Low-Resource Dialect Classification Using Retrieval-based Voice Conversion},
booktitle={Proc. Interspeech 2025},
pages={2780--2784},
url={https://www.isca-speech.org/archive/interspeech_2025/fischbach25_interspeech.html},
year={2025},
abstract={Deep learning models for dialect identification are often limited by the scarcity of dialectal data. To address this challenge, we propose to use Retrieval-based Voice Conversion (RVC) as an effective data augmentation method for a low-resource German dialect classification task. By converting audio samples to a uniform target speaker, RVC minimizes speaker-related variability, enabling models to...}}