Investigation of Drift Detection for Clinical Text Classification
Today, machine learning models are applied in various healthcare applications in productive use. The availability of extensive patient information in electronic formats makes it possible to utilize them and develop machine learning-based models for data analysis. However, the performance of an operational model is continuously subject to degradation due to unforeseen changes in the input data flow. Therefore, monitoring data drift becomes essential to maintain the desired performance of the trained models. In the context of monitoring and drift detection, statistical hypothesis testing enables us to examine whether incoming data deviate from training data. Recent studies show that Kernel Maximum Mean Discrepancy (KMMD) and Kolmogorov–Smirnov (KS) can reliably measure the distance between multivariate distributions, hence drift detection. In this work, we conduct a case study on drift detection based on textual data from drug reviews and propose the sub-sampling method to stabilize drift detection. The results of our experiments show that both KMMD and KS detect changes in the text reviews with a limited number of these reviews in both the reference and test data.
- Published in:
International Workshop on Health Intelligence - Type:
Inproceedings - Authors:
Abdelwahab, Hammam; Martens, Claudio; Beck, Niklas; Wegener, Dennis - Year:
2023
Citation information
Abdelwahab, Hammam; Martens, Claudio; Beck, Niklas; Wegener, Dennis: Investigation of Drift Detection for Clinical Text Classification, International Workshop on Health Intelligence, 2023, https://link.springer.com/chapter/10.1007/978-3-031-36938-4_4, Abdelwahab.etal.2023a,
@Inproceedings{Abdelwahab.etal.2023a,
author={Abdelwahab, Hammam; Martens, Claudio; Beck, Niklas; Wegener, Dennis},
title={Investigation of Drift Detection for Clinical Text Classification},
booktitle={International Workshop on Health Intelligence},
url={https://link.springer.com/chapter/10.1007/978-3-031-36938-4_4},
year={2023},
abstract={Today, machine learning models are applied in various healthcare applications in productive use. The availability of extensive patient information in electronic formats makes it possible to utilize them and develop machine learning-based models for data analysis. However, the performance of an operational model is continuously subject to degradation due to unforeseen changes in the input data...}}