Evaluating Explanation Robustness to Model Pruning

Explainability methods are a solution to enhance the model transparency by monitoring the propagation gradient or observing the correlation between inputs and outputs to demonstrate the features that are crucial for decision making. Nevertheless, existing studies suggest that the reliability of explainability methods is controversial due to the presence of several counter-intuitive properties such as failure of the salinity test or lack of linear transformation invariance. In this work, we examine the plausibility of explainability approaches from a novel perspective, i.e., the robustness to model pruning. We show that even when only those neurons with the least importance are eliminated and there are no noticeable fluctuations in the prediction performances, the explanation is dramatically corrupted. Extensive experiments qualitatively and quantitatively illustrate that most of the popular explainability methods are insufficiently robust to the simplest model pruning algorithms.

  • Published in:
    2024 International Joint Conference on Neural Networks (IJCNN)
  • Type:
    Inproceedings
  • Authors:
    Tan, Hanxiao
  • Year:
    2024

Citation information

Tan, Hanxiao: Evaluating Explanation Robustness to Model Pruning, 2024 International Joint Conference on Neural Networks (IJCNN), 2024, 1--8, June, IEEE, Tan.2024a,

Associated Lamarr Researchers

lamarr institute person hanxiao tan - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI)

Hanxiao Tan

Scientist to the profile