Towards Robust Saliency Maps

Nham Le; Arie Gurfinkel; Xujie Si; Chuqin Geng

Towards Robust Saliency Maps

Nham Le, Arie Gurfinkel, Xujie Si, Chuqin Geng

Proceedings of the 16th Asian Conference on Machine Learning, PMLR 260:351-366, 2025.

Abstract

Saliency maps are one of the most popular tools to interpret the operation of a neural network: they compute input features deemed relevant to the final prediction, which are often subsets of pixels that are easily understandable by a human being. However, it is known that relying solely on human assessment to judge a saliency map method can be misleading. In this work, we propose a new neural network verification specification called saliency-robustness, which aims to use formal methods to prove a relationship between Vanilla Gradient (VG) – a simple yet surprisingly effective saliency map method – and the network’s prediction: given a network, if an input $x$ emits a certain VG saliency map, it is mathematically proven (or disproven) that the network must classify $x$ in a certain way. We then introduce a novel method that combines both Marabou and Crown – two state-of-the-art neural network verifiers, to solve the proposed specification. Experiments on our synthetic dataset and MNIST show that Vanilla Gradient is surprisingly effective as a certification for the predicted output.

Cite this Paper

BibTeX

@InProceedings{pmlr-v260-le25a,
  title = 	 {Towards Robust Saliency Maps},
  author =       {Le, Nham and Gurfinkel, Arie and Si, Xujie and Geng, Chuqin},
  booktitle = 	 {Proceedings of the 16th Asian Conference on Machine Learning},
  pages = 	 {351--366},
  year = 	 {2025},
  editor = 	 {Nguyen, Vu and Lin, Hsuan-Tien},
  volume = 	 {260},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {05--08 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v260/main/assets/le25a/le25a.pdf},
  url = 	 {https://proceedings.mlr.press/v260/le25a.html},
  abstract = 	 {Saliency maps are one of the most popular tools to interpret the operation of a neural network: they compute input features deemed relevant to the final prediction, which are often subsets of pixels that are easily understandable by a human being. However, it is known that relying solely on human assessment to judge a saliency map method can be misleading.
 In this work, we propose a new neural network verification specification called saliency-robustness, which aims to use formal methods to prove a relationship between Vanilla Gradient (VG) – a simple yet surprisingly effective saliency map method – and the network’s prediction: given a network, if an input $x$ emits a certain VG saliency map, it is mathematically proven (or disproven) that the network must classify $x$ in a certain way.
 We then introduce a novel method that combines both Marabou and Crown – two state-of-the-art neural network verifiers, to solve the proposed specification. Experiments on our synthetic dataset and MNIST show that Vanilla Gradient is surprisingly effective as a certification for the predicted output.}
}

Endnote

%0 Conference Paper
%T Towards Robust Saliency Maps
%A Nham Le
%A Arie Gurfinkel
%A Xujie Si
%A Chuqin Geng
%B Proceedings of the 16th Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Vu Nguyen
%E Hsuan-Tien Lin	
%F pmlr-v260-le25a
%I PMLR
%P 351--366
%U https://proceedings.mlr.press/v260/le25a.html
%V 260
%X Saliency maps are one of the most popular tools to interpret the operation of a neural network: they compute input features deemed relevant to the final prediction, which are often subsets of pixels that are easily understandable by a human being. However, it is known that relying solely on human assessment to judge a saliency map method can be misleading.
 In this work, we propose a new neural network verification specification called saliency-robustness, which aims to use formal methods to prove a relationship between Vanilla Gradient (VG) – a simple yet surprisingly effective saliency map method – and the network’s prediction: given a network, if an input $x$ emits a certain VG saliency map, it is mathematically proven (or disproven) that the network must classify $x$ in a certain way.
 We then introduce a novel method that combines both Marabou and Crown – two state-of-the-art neural network verifiers, to solve the proposed specification. Experiments on our synthetic dataset and MNIST show that Vanilla Gradient is surprisingly effective as a certification for the predicted output.

APA

Le, N., Gurfinkel, A., Si, X. & Geng, C.. (2025). Towards Robust Saliency Maps. Proceedings of the 16th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 260:351-366 Available from https://proceedings.mlr.press/v260/le25a.html.

Towards Robust Saliency Maps

Abstract

Cite this Paper

Related Material