Understanding Impacts of HighOrder Loss Approximations and Features in Deep Learning Interpretation
[edit]
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:58485856, 2019.
Abstract
Current saliency map interpretations for neural networks generally rely on two key assumptions. First, they use firstorder approximations of the loss function, neglecting higherorder terms such as the loss curvature. Second, they evaluate each feature’s importance in isolation, ignoring feature interdependencies. This work studies the effect of relaxing these two assumptions. First, we characterize a closedform formula for the input Hessian matrix of a deep ReLU network. Using this formula, we show that, for classification problems with many classes, if a prediction has high probability then including the Hessian term has a small impact on the interpretation. We prove this result by demonstrating that these conditions cause the Hessian matrix to be approximately rank one and its leading eigenvector to be almost parallel to the gradient of the loss. We empirically validate this theory by interpreting ImageNet classifiers. Second, we incorporate feature interdependencies by calculating the importance of groupfeatures using a sparsity regularization term. We use an L0  L1 relaxation technique along with proximal gradient descent to efficiently compute groupfeature importance values. Our empirical results show that our method significantly improves deep learning interpretations.
Related Material


