Active Sampling for Learning Interpretable Surrogate Machine Learning Models

Author: A. Saadallah, K. Morik
Journal: 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA)
Year: 2020

Citation information

A. Saadallah, K. Morik,
2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA),
2020,
264-272,
IEEE,
Sydney, NSW, Australia,
https://doi.org/10.1109/DSAA49011.2020.00039

The use of machine learning methods to inform consequential decisions is increasingly expanding across many fields. As a result, the ability to interpret these models has become to a greater extent crucial to increase the related-technologies acceptance level and reliability. In this paper, we propose an active sampling approach for learning accurately interpretable surrogate machine learning model to better approximate black-box models for supervised learning problems. Hence, the surrogate model is used to learn the black-box model and reflect its properties. Active sampling is used as an informed sampling method to adaptively and iteratively build an optimized training set based on the predictions of the black-box model to enhance the accuracy of the surrogate model. Subsequently, the surrogate model is used to interpret and debug the black-box model. The developed method is flexible and can be used to approximate any family of black-box models using any type of interpretable machine learning models, as it only requires the ability to compute their outputs. It is also applicable to both regression and classification tasks. In this work, we bring focus to decision tree due to their proven high interpretability. An experimental evaluation of the method on several real-world data sets is presented to show its flexibility and its robustness compared to traditional approaches for learning surrogate models.