Generating Deep Networks Explanations with Robust Attribution Alignment

Guohang Zeng, Yousef Kowsar, Sarah Erfani, James Bailey
Proceedings of The 13th Asian Conference on Machine Learning, PMLR 157:753-768, 2021.

Abstract

Attribution methods play a key role in generating post-hoc explanations on pre-trained models, however it has been shown that existing methods yield unfaithful and noisy explanations. In this paper, we propose a new paradigm of attribution method: we treat the model’s explanations as a part of network’s outputs then generate attribution maps from the underlying deep network. The generated attribution maps are up-sampled from the last convolutional layer of the network to obtain localization information about the target to be explained. Inspired by recent studies that showed adversarially robust models’ saliency map aligns well with human perception, we utilize attribution maps from the robust model to supervise the learned attributions. Our proposed method can produce visually plausible explanations along with the prediction in inference phase. Experiments on real datasets show that our proposed method yields more faithful explanations than post-hoc attribution methods with lighter computational costs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v157-zeng21b, title = {Generating Deep Networks Explanations with Robust Attribution Alignment}, author = {Zeng, Guohang and Kowsar, Yousef and Erfani, Sarah and Bailey, James}, booktitle = {Proceedings of The 13th Asian Conference on Machine Learning}, pages = {753--768}, year = {2021}, editor = {Balasubramanian, Vineeth N. and Tsang, Ivor}, volume = {157}, series = {Proceedings of Machine Learning Research}, month = {17--19 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v157/zeng21b/zeng21b.pdf}, url = {https://proceedings.mlr.press/v157/zeng21b.html}, abstract = {Attribution methods play a key role in generating post-hoc explanations on pre-trained models, however it has been shown that existing methods yield unfaithful and noisy explanations. In this paper, we propose a new paradigm of attribution method: we treat the model’s explanations as a part of network’s outputs then generate attribution maps from the underlying deep network. The generated attribution maps are up-sampled from the last convolutional layer of the network to obtain localization information about the target to be explained. Inspired by recent studies that showed adversarially robust models’ saliency map aligns well with human perception, we utilize attribution maps from the robust model to supervise the learned attributions. Our proposed method can produce visually plausible explanations along with the prediction in inference phase. Experiments on real datasets show that our proposed method yields more faithful explanations than post-hoc attribution methods with lighter computational costs.} }
Endnote
%0 Conference Paper %T Generating Deep Networks Explanations with Robust Attribution Alignment %A Guohang Zeng %A Yousef Kowsar %A Sarah Erfani %A James Bailey %B Proceedings of The 13th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Vineeth N. Balasubramanian %E Ivor Tsang %F pmlr-v157-zeng21b %I PMLR %P 753--768 %U https://proceedings.mlr.press/v157/zeng21b.html %V 157 %X Attribution methods play a key role in generating post-hoc explanations on pre-trained models, however it has been shown that existing methods yield unfaithful and noisy explanations. In this paper, we propose a new paradigm of attribution method: we treat the model’s explanations as a part of network’s outputs then generate attribution maps from the underlying deep network. The generated attribution maps are up-sampled from the last convolutional layer of the network to obtain localization information about the target to be explained. Inspired by recent studies that showed adversarially robust models’ saliency map aligns well with human perception, we utilize attribution maps from the robust model to supervise the learned attributions. Our proposed method can produce visually plausible explanations along with the prediction in inference phase. Experiments on real datasets show that our proposed method yields more faithful explanations than post-hoc attribution methods with lighter computational costs.
APA
Zeng, G., Kowsar, Y., Erfani, S. & Bailey, J.. (2021). Generating Deep Networks Explanations with Robust Attribution Alignment. Proceedings of The 13th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 157:753-768 Available from https://proceedings.mlr.press/v157/zeng21b.html.

Related Material