Evaluating Explanation Robustness to Model Pruning

Explainability methods are a solution to enhance the model transparency by monitoring the propagation gradient or observing the correlation between inputs and outputs to demonstrate the features that are crucial for decision making. Nevertheless, existing studies suggest that the reliability of explainability methods is controversial due to the presence of several counter-intuitive properties such as failure of the salinity test or lack of linear transformation invariance. In this work, we examine the plausibility of explainability approaches from a novel perspective, i.e., the robustness to model pruning. We show that even when only those neurons with the least importance are eliminated and there are no noticeable fluctuations in the prediction performances, the explanation is dramatically corrupted. Extensive experiments qualitatively and quantitatively illustrate that most of the popular explainability methods are insufficiently robust to the simplest model pruning algorithms.

Published in:
2024 International Joint Conference on Neural Networks (IJCNN)
Type:
Inproceedings
Authors:
Tan, Hanxiao
Year:
2024
Source:
https://ieeexplore.ieee.org/document/10650278

Citation information

Tan, Hanxiao: Evaluating Explanation Robustness to Model Pruning, 2024 International Joint Conference on Neural Networks (IJCNN), 2024, 1--8, June, IEEE, https://ieeexplore.ieee.org/document/10650278, Tan.2024a,

Open BibTeX citation

Evaluating Explanation Robustness to Model Pruning

Citation information

Associated Lamarr Researchers

Hanxiao Tan