Fusing Speech and Language Models for Dementia Detection
Accurate detection of dementia is crucial for timely intervention and care, and leveraging multimodal data holds significant potential for improving diagnostic accuracy. In this study, we explore deep learning approaches for dementia classification using the Pitt corpus, which includes brief participant descriptions of a cookie theft scene. We analyze 242 control and 307 dementia audio clips to investigate various representation learning techniques. Our best-performing approach fuses audio spectrograms with advanced language models, including Whisper model transcriptions and transformer-based feature extraction. We rigorously evaluate these models and find that our multimodal approach with an F1-score of 86.42% eclipses other single modality approaches by a considerable margin. Our findings underscore the promise of multimodal deep learning techniques in advancing the reliability of dementia detection through audio analysis, possibly paving the way for more robust and accessible diagnostic tools.
- Published in:
2024 {IEEE} International Conference on Big Data ({BigData}) - Type:
Inproceedings - Authors:
Deußer, Tobias; Siddiqi, Abdul Mohsin; Sparrenberg, Lorenz; Adams, Tobias; Bauckhage, Christian; Sifa, Rafet - Year:
2024 - Source:
https://ieeexplore.ieee.org/abstract/document/10825055/footnotes#footnotes
Citation information
Deußer, Tobias; Siddiqi, Abdul Mohsin; Sparrenberg, Lorenz; Adams, Tobias; Bauckhage, Christian; Sifa, Rafet: Fusing Speech and Language Models for Dementia Detection, 2024 {IEEE} International Conference on Big Data ({BigData}), 2024, 3908--3914, December, https://ieeexplore.ieee.org/abstract/document/10825055/footnotes#footnotes, Deusser.etal.2024a,
@Inproceedings{Deusser.etal.2024a,
author={Deußer, Tobias; Siddiqi, Abdul Mohsin; Sparrenberg, Lorenz; Adams, Tobias; Bauckhage, Christian; Sifa, Rafet},
title={Fusing Speech and Language Models for Dementia Detection},
booktitle={2024 {IEEE} International Conference on Big Data ({BigData})},
pages={3908--3914},
month={December},
url={https://ieeexplore.ieee.org/abstract/document/10825055/footnotes#footnotes},
year={2024},
abstract={Accurate detection of dementia is crucial for timely intervention and care, and leveraging multimodal data holds significant potential for improving diagnostic accuracy. In this study, we explore deep learning approaches for dementia classification using the Pitt corpus, which includes brief participant descriptions of a cookie theft scene. We analyze 242 control and 307 dementia audio clips to...}}