When Explanations Lie: Why Many Modified BP Attributions Fail

Leon Sixt, Maximilian Granz, Tim Landgraf
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:9046-9057, 2020.

Abstract

Attribution methods aim to explain a neural network’s prediction by highlighting the most relevant image areas. A popular approach is to backpropagate (BP) a custom relevance score using modified rules, rather than the gradient. We analyze an extensive set of modified BP methods: Deep Taylor Decomposition, Layer-wise Relevance Propagation (LRP), Excitation BP, PatternAttribution, DeepLIFT, Deconv, RectGrad, and Guided BP. We find empirically that the explanations of all mentioned methods, except for DeepLIFT, are independent of the parameters of later layers. We provide theoretical insights for this surprising behavior and also analyze why DeepLIFT does not suffer from this limitation. Empirically, we measure how information of later layers is ignored by using our new metric, cosine similarity convergence (CSC). The paper provides a framework to assess the faithfulness of new and existing modified BP methods theoretically and empirically.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-sixt20a, title = {When Explanations Lie: Why Many Modified {BP} Attributions Fail}, author = {Sixt, Leon and Granz, Maximilian and Landgraf, Tim}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {9046--9057}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/sixt20a/sixt20a.pdf}, url = {https://proceedings.mlr.press/v119/sixt20a.html}, abstract = {Attribution methods aim to explain a neural network’s prediction by highlighting the most relevant image areas. A popular approach is to backpropagate (BP) a custom relevance score using modified rules, rather than the gradient. We analyze an extensive set of modified BP methods: Deep Taylor Decomposition, Layer-wise Relevance Propagation (LRP), Excitation BP, PatternAttribution, DeepLIFT, Deconv, RectGrad, and Guided BP. We find empirically that the explanations of all mentioned methods, except for DeepLIFT, are independent of the parameters of later layers. We provide theoretical insights for this surprising behavior and also analyze why DeepLIFT does not suffer from this limitation. Empirically, we measure how information of later layers is ignored by using our new metric, cosine similarity convergence (CSC). The paper provides a framework to assess the faithfulness of new and existing modified BP methods theoretically and empirically.} }
Endnote
%0 Conference Paper %T When Explanations Lie: Why Many Modified BP Attributions Fail %A Leon Sixt %A Maximilian Granz %A Tim Landgraf %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-sixt20a %I PMLR %P 9046--9057 %U https://proceedings.mlr.press/v119/sixt20a.html %V 119 %X Attribution methods aim to explain a neural network’s prediction by highlighting the most relevant image areas. A popular approach is to backpropagate (BP) a custom relevance score using modified rules, rather than the gradient. We analyze an extensive set of modified BP methods: Deep Taylor Decomposition, Layer-wise Relevance Propagation (LRP), Excitation BP, PatternAttribution, DeepLIFT, Deconv, RectGrad, and Guided BP. We find empirically that the explanations of all mentioned methods, except for DeepLIFT, are independent of the parameters of later layers. We provide theoretical insights for this surprising behavior and also analyze why DeepLIFT does not suffer from this limitation. Empirically, we measure how information of later layers is ignored by using our new metric, cosine similarity convergence (CSC). The paper provides a framework to assess the faithfulness of new and existing modified BP methods theoretically and empirically.
APA
Sixt, L., Granz, M. & Landgraf, T.. (2020). When Explanations Lie: Why Many Modified BP Attributions Fail. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:9046-9057 Available from https://proceedings.mlr.press/v119/sixt20a.html.

Related Material