Methods from explainable machine learning are increasingly applied. However, evaluation of these methods is often anecdotal and not systematic. Prior work has identified properties of explanation quality and we argue that evaluation should be based on them. In this work, we provide an evaluation process that follows the idea of property testing. The process acknowledges the central role of the human, yet argues for a quantitative approach for the evaluation. We find that properties can be divided into two groups, one to ensure trustworthiness, the other to assess comprehensibility. Options for quantitative property tests are discussed. Future research should focus on the standardization of testing procedures.
A Quantitative Human-Grounded Evaluation Process for Explainable ML
A Quantitative Human-Grounded Evaluation Process for Explainable ML.