Investigation of Drift Detection for Clinical Text Classification

Today, machine learning models are applied in various healthcare applications in productive use. The availability of extensive patient information in electronic formats makes it possible to utilize them and develop machine learning-based models for data analysis. However, the performance of an operational model is continuously subject to degradation due to unforeseen changes in the input data flow. Therefore, monitoring data drift becomes essential to maintain the desired performance of the trained models. In the context of monitoring and drift detection, statistical hypothesis testing enables us to examine whether incoming data deviate from training data. Recent studies show that Kernel Maximum Mean Discrepancy (KMMD) and Kolmogorov–Smirnov (KS) can reliably measure the distance between multivariate distributions, hence drift detection. In this work, we conduct a case study on drift detection based on textual data from drug reviews and propose the sub-sampling method to stabilize drift detection. The results of our experiments show that both KMMD and KS detect changes in the text reviews with a limited number of these reviews in both the reference and test data.

Published in:
Artificial Intelligence for Personalized Medicine. W3PHAI 2023
Type:
Inproceedings
Authors:
Abdelwahab, Hammam; Martens, Claudio; Beck, Niklas; Wegener, Dennis
Year:
2023
Source:
https://link.springer.com/chapter/10.1007/978-3-031-36938-4_4

Citation information

Abdelwahab, Hammam; Martens, Claudio; Beck, Niklas; Wegener, Dennis: Investigation of Drift Detection for Clinical Text Classification, Artificial Intelligence for Personalized Medicine. W3PHAI 2023, 2023, https://link.springer.com/chapter/10.1007/978-3-031-36938-4_4, Abdelwahab.etal.2023a,

Open BibTeX citation

Investigation of Drift Detection for Clinical Text Classification

Citation information

Associated Lamarr Researchers

Claudio Martens

Dennis Wegener