Robust Explanation for Free or At the Cost of Faithfulness

Zeren Tan, Yang Tian
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:33534-33562, 2023.

Abstract

Devoted to interpreting the explicit behaviors of machine learning models, explanation methods can identify implicit characteristics of models to improve trustworthiness. However, explanation methods are shown as vulnerable to adversarial perturbations, implying security concerns in high-stakes domains. In this paper, we investigate when robust explanations are necessary and what they cost. We prove that the robustness of explanations is determined by the robustness of the model to be explained. Therefore, we can have robust explanations for free for a robust model. To have robust explanations for a non-robust model, composing the original model with a kernel is proved as an effective way that returns strictly more robust explanations. Nevertheless, we argue that this also incurs a robustness-faithfulness trade-off, i.e., contrary to common expectations, an explanation method may also become less faithful when it becomes more robust. This argument holds for any model. We are the first to introduce this trade-off and theoretically prove its existence for SmoothGrad. Theoretical findings are verified by empirical evidence on six state-of-the-art explanation methods and four backbones.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-tan23a, title = {Robust Explanation for Free or At the Cost of Faithfulness}, author = {Tan, Zeren and Tian, Yang}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {33534--33562}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/tan23a/tan23a.pdf}, url = {https://proceedings.mlr.press/v202/tan23a.html}, abstract = {Devoted to interpreting the explicit behaviors of machine learning models, explanation methods can identify implicit characteristics of models to improve trustworthiness. However, explanation methods are shown as vulnerable to adversarial perturbations, implying security concerns in high-stakes domains. In this paper, we investigate when robust explanations are necessary and what they cost. We prove that the robustness of explanations is determined by the robustness of the model to be explained. Therefore, we can have robust explanations for free for a robust model. To have robust explanations for a non-robust model, composing the original model with a kernel is proved as an effective way that returns strictly more robust explanations. Nevertheless, we argue that this also incurs a robustness-faithfulness trade-off, i.e., contrary to common expectations, an explanation method may also become less faithful when it becomes more robust. This argument holds for any model. We are the first to introduce this trade-off and theoretically prove its existence for SmoothGrad. Theoretical findings are verified by empirical evidence on six state-of-the-art explanation methods and four backbones.} }
Endnote
%0 Conference Paper %T Robust Explanation for Free or At the Cost of Faithfulness %A Zeren Tan %A Yang Tian %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-tan23a %I PMLR %P 33534--33562 %U https://proceedings.mlr.press/v202/tan23a.html %V 202 %X Devoted to interpreting the explicit behaviors of machine learning models, explanation methods can identify implicit characteristics of models to improve trustworthiness. However, explanation methods are shown as vulnerable to adversarial perturbations, implying security concerns in high-stakes domains. In this paper, we investigate when robust explanations are necessary and what they cost. We prove that the robustness of explanations is determined by the robustness of the model to be explained. Therefore, we can have robust explanations for free for a robust model. To have robust explanations for a non-robust model, composing the original model with a kernel is proved as an effective way that returns strictly more robust explanations. Nevertheless, we argue that this also incurs a robustness-faithfulness trade-off, i.e., contrary to common expectations, an explanation method may also become less faithful when it becomes more robust. This argument holds for any model. We are the first to introduce this trade-off and theoretically prove its existence for SmoothGrad. Theoretical findings are verified by empirical evidence on six state-of-the-art explanation methods and four backbones.
APA
Tan, Z. & Tian, Y.. (2023). Robust Explanation for Free or At the Cost of Faithfulness. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:33534-33562 Available from https://proceedings.mlr.press/v202/tan23a.html.

Related Material