Don’t be fooled: label leakage in explanation methods and the importance of their quantitative evaluation

Neil Jethani, Adriel Saporta, Rajesh Ranganath
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:8925-8953, 2023.

Abstract

Feature attribution methods identify which features of an input most influence a model’s output. Most widely-used feature attribution methods (such as SHAP, LIME, and Grad-CAM) are “class-dependent” methods in that they generate a feature attribution vector as a function of class. In this work, we demonstrate that class-dependent methods can “leak” information about the selected class, making that class appear more likely than it is. Thus, an end user runs the risk of drawing false conclusions when interpreting an explanation generated by a class-dependent method. In contrast, we introduce “distribution-aware” methods, which favor explanations that keep the label’s distribution close to its distribution given all features of the input. We introduce SHAP-KL and FastSHAP-KL, two baseline distribution-aware methods that compute Shapley values. Finally, we perform a comprehensive evaluation of seven class-dependent and three distribution-aware methods on three clinical datasets of different high-dimensional data types: images, biosignals, and text.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-jethani23a, title = {Don’t be fooled: label leakage in explanation methods and the importance of their quantitative evaluation}, author = {Jethani, Neil and Saporta, Adriel and Ranganath, Rajesh}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {8925--8953}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/jethani23a/jethani23a.pdf}, url = {https://proceedings.mlr.press/v206/jethani23a.html}, abstract = {Feature attribution methods identify which features of an input most influence a model’s output. Most widely-used feature attribution methods (such as SHAP, LIME, and Grad-CAM) are “class-dependent” methods in that they generate a feature attribution vector as a function of class. In this work, we demonstrate that class-dependent methods can “leak” information about the selected class, making that class appear more likely than it is. Thus, an end user runs the risk of drawing false conclusions when interpreting an explanation generated by a class-dependent method. In contrast, we introduce “distribution-aware” methods, which favor explanations that keep the label’s distribution close to its distribution given all features of the input. We introduce SHAP-KL and FastSHAP-KL, two baseline distribution-aware methods that compute Shapley values. Finally, we perform a comprehensive evaluation of seven class-dependent and three distribution-aware methods on three clinical datasets of different high-dimensional data types: images, biosignals, and text.} }
Endnote
%0 Conference Paper %T Don’t be fooled: label leakage in explanation methods and the importance of their quantitative evaluation %A Neil Jethani %A Adriel Saporta %A Rajesh Ranganath %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-jethani23a %I PMLR %P 8925--8953 %U https://proceedings.mlr.press/v206/jethani23a.html %V 206 %X Feature attribution methods identify which features of an input most influence a model’s output. Most widely-used feature attribution methods (such as SHAP, LIME, and Grad-CAM) are “class-dependent” methods in that they generate a feature attribution vector as a function of class. In this work, we demonstrate that class-dependent methods can “leak” information about the selected class, making that class appear more likely than it is. Thus, an end user runs the risk of drawing false conclusions when interpreting an explanation generated by a class-dependent method. In contrast, we introduce “distribution-aware” methods, which favor explanations that keep the label’s distribution close to its distribution given all features of the input. We introduce SHAP-KL and FastSHAP-KL, two baseline distribution-aware methods that compute Shapley values. Finally, we perform a comprehensive evaluation of seven class-dependent and three distribution-aware methods on three clinical datasets of different high-dimensional data types: images, biosignals, and text.
APA
Jethani, N., Saporta, A. & Ranganath, R.. (2023). Don’t be fooled: label leakage in explanation methods and the importance of their quantitative evaluation. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:8925-8953 Available from https://proceedings.mlr.press/v206/jethani23a.html.

Related Material