Towards Robust Saliency Maps

Nham Le, Arie Gurfinkel, Xujie Si, Chuqin Geng
Proceedings of the 16th Asian Conference on Machine Learning, PMLR 260:351-366, 2025.

Abstract

Saliency maps are one of the most popular tools to interpret the operation of a neural network: they compute input features deemed relevant to the final prediction, which are often subsets of pixels that are easily understandable by a human being. However, it is known that relying solely on human assessment to judge a saliency map method can be misleading. In this work, we propose a new neural network verification specification called saliency-robustness, which aims to use formal methods to prove a relationship between Vanilla Gradient (VG) – a simple yet surprisingly effective saliency map method – and the network’s prediction: given a network, if an input x emits a certain VG saliency map, it is mathematically proven (or disproven) that the network must classify x in a certain way. We then introduce a novel method that combines both Marabou and Crown – two state-of-the-art neural network verifiers, to solve the proposed specification. Experiments on our synthetic dataset and MNIST show that Vanilla Gradient is surprisingly effective as a certification for the predicted output.

Cite this Paper


BibTeX
@InProceedings{pmlr-v260-le25a, title = {Towards Robust Saliency Maps}, author = {Le, Nham and Gurfinkel, Arie and Si, Xujie and Geng, Chuqin}, booktitle = {Proceedings of the 16th Asian Conference on Machine Learning}, pages = {351--366}, year = {2025}, editor = {Nguyen, Vu and Lin, Hsuan-Tien}, volume = {260}, series = {Proceedings of Machine Learning Research}, month = {05--08 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v260/main/assets/le25a/le25a.pdf}, url = {https://proceedings.mlr.press/v260/le25a.html}, abstract = {Saliency maps are one of the most popular tools to interpret the operation of a neural network: they compute input features deemed relevant to the final prediction, which are often subsets of pixels that are easily understandable by a human being. However, it is known that relying solely on human assessment to judge a saliency map method can be misleading. In this work, we propose a new neural network verification specification called saliency-robustness, which aims to use formal methods to prove a relationship between Vanilla Gradient (VG) – a simple yet surprisingly effective saliency map method – and the network’s prediction: given a network, if an input $x$ emits a certain VG saliency map, it is mathematically proven (or disproven) that the network must classify $x$ in a certain way. We then introduce a novel method that combines both Marabou and Crown – two state-of-the-art neural network verifiers, to solve the proposed specification. Experiments on our synthetic dataset and MNIST show that Vanilla Gradient is surprisingly effective as a certification for the predicted output.} }
Endnote
%0 Conference Paper %T Towards Robust Saliency Maps %A Nham Le %A Arie Gurfinkel %A Xujie Si %A Chuqin Geng %B Proceedings of the 16th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Vu Nguyen %E Hsuan-Tien Lin %F pmlr-v260-le25a %I PMLR %P 351--366 %U https://proceedings.mlr.press/v260/le25a.html %V 260 %X Saliency maps are one of the most popular tools to interpret the operation of a neural network: they compute input features deemed relevant to the final prediction, which are often subsets of pixels that are easily understandable by a human being. However, it is known that relying solely on human assessment to judge a saliency map method can be misleading. In this work, we propose a new neural network verification specification called saliency-robustness, which aims to use formal methods to prove a relationship between Vanilla Gradient (VG) – a simple yet surprisingly effective saliency map method – and the network’s prediction: given a network, if an input $x$ emits a certain VG saliency map, it is mathematically proven (or disproven) that the network must classify $x$ in a certain way. We then introduce a novel method that combines both Marabou and Crown – two state-of-the-art neural network verifiers, to solve the proposed specification. Experiments on our synthetic dataset and MNIST show that Vanilla Gradient is surprisingly effective as a certification for the predicted output.
APA
Le, N., Gurfinkel, A., Si, X. & Geng, C.. (2025). Towards Robust Saliency Maps. Proceedings of the 16th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 260:351-366 Available from https://proceedings.mlr.press/v260/le25a.html.

Related Material