Saliency strikes back: How filtering out high frequencies improves white-box explanations

Sabine Muzellec, Thomas Fel, Victor Boutin, Léo Andéol, Rufin Vanrullen, Thomas Serre
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:37041-37075, 2024.

Abstract

Attribution methods correspond to a class of explainability methods (XAI) that aim to assess how individual inputs contribute to a model’s decision-making process. We have identified a significant limitation in one type of attribution methods, known as “white-box" methods. Although highly efficient, as we will show, these methods rely on a gradient signal that is often contaminated by high-frequency artifacts. To overcome this limitation, we introduce a new approach called "FORGrad". This simple method effectively filters out these high-frequency artifacts using optimal cut-off frequencies tailored to the unique characteristics of each model architecture. Our findings show that FORGrad consistently enhances the performance of already existing white-box methods, enabling them to compete effectively with more accurate yet computationally demanding "black-box" methods. We anticipate that our research will foster broader adoption of simpler and more efficient white-box methods for explainability, offering a better balance between faithfulness and computational efficiency.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-muzellec24a, title = {Saliency strikes back: How filtering out high frequencies improves white-box explanations}, author = {Muzellec, Sabine and Fel, Thomas and Boutin, Victor and And\'{e}ol, L\'{e}o and Vanrullen, Rufin and Serre, Thomas}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {37041--37075}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/muzellec24a/muzellec24a.pdf}, url = {https://proceedings.mlr.press/v235/muzellec24a.html}, abstract = {Attribution methods correspond to a class of explainability methods (XAI) that aim to assess how individual inputs contribute to a model’s decision-making process. We have identified a significant limitation in one type of attribution methods, known as “white-box" methods. Although highly efficient, as we will show, these methods rely on a gradient signal that is often contaminated by high-frequency artifacts. To overcome this limitation, we introduce a new approach called "FORGrad". This simple method effectively filters out these high-frequency artifacts using optimal cut-off frequencies tailored to the unique characteristics of each model architecture. Our findings show that FORGrad consistently enhances the performance of already existing white-box methods, enabling them to compete effectively with more accurate yet computationally demanding "black-box" methods. We anticipate that our research will foster broader adoption of simpler and more efficient white-box methods for explainability, offering a better balance between faithfulness and computational efficiency.} }
Endnote
%0 Conference Paper %T Saliency strikes back: How filtering out high frequencies improves white-box explanations %A Sabine Muzellec %A Thomas Fel %A Victor Boutin %A Léo Andéol %A Rufin Vanrullen %A Thomas Serre %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-muzellec24a %I PMLR %P 37041--37075 %U https://proceedings.mlr.press/v235/muzellec24a.html %V 235 %X Attribution methods correspond to a class of explainability methods (XAI) that aim to assess how individual inputs contribute to a model’s decision-making process. We have identified a significant limitation in one type of attribution methods, known as “white-box" methods. Although highly efficient, as we will show, these methods rely on a gradient signal that is often contaminated by high-frequency artifacts. To overcome this limitation, we introduce a new approach called "FORGrad". This simple method effectively filters out these high-frequency artifacts using optimal cut-off frequencies tailored to the unique characteristics of each model architecture. Our findings show that FORGrad consistently enhances the performance of already existing white-box methods, enabling them to compete effectively with more accurate yet computationally demanding "black-box" methods. We anticipate that our research will foster broader adoption of simpler and more efficient white-box methods for explainability, offering a better balance between faithfulness and computational efficiency.
APA
Muzellec, S., Fel, T., Boutin, V., Andéol, L., Vanrullen, R. & Serre, T.. (2024). Saliency strikes back: How filtering out high frequencies improves white-box explanations. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:37041-37075 Available from https://proceedings.mlr.press/v235/muzellec24a.html.

Related Material