[edit]
Towards Robust Saliency Maps
Proceedings of the 16th Asian Conference on Machine Learning, PMLR 260:351-366, 2025.
Abstract
Saliency maps are one of the most popular tools to interpret the operation of a neural network: they compute input features deemed relevant to the final prediction, which are often subsets of pixels that are easily understandable by a human being. However, it is known that relying solely on human assessment to judge a saliency map method can be misleading.
In this work, we propose a new neural network verification specification called saliency-robustness, which aims to use formal methods to prove a relationship between Vanilla Gradient (VG) – a simple yet surprisingly effective saliency map method – and the network’s prediction: given a network, if an input x emits a certain VG saliency map, it is mathematically proven (or disproven) that the network must classify x in a certain way.
We then introduce a novel method that combines both Marabou and Crown – two state-of-the-art neural network verifiers, to solve the proposed specification. Experiments on our synthetic dataset and MNIST show that Vanilla Gradient is surprisingly effective as a certification for the predicted output.