[edit]
Deep learning interpretation: Flip points and homotopy methods
Proceedings of The First Mathematical and Scientific Machine Learning Conference, PMLR 107:1-26, 2020.
Abstract
Deep learning models are complicated mathematical functions, and their interpretation remains a challenging research question. We formulate and solve optimization problems to answer questions about the models and their outputs. Specifically, we develop methods to study the decision boundaries of classification models using {\em flip points}. A flip point is any point that lies on the boundary between two output classes: e.g. for a neural network with a binary yes/no output, a flip point is any input that generates equal scores for “yes” and “no”. The flip point closest to a given input is of particular importance, and this point is the solution to a well-posed optimization problem. To compute the closest flip point, we develop a homotopy algorithm to overcome the issues of vanishing and exploding gradients and to find a feasible solution for our optimization problem. We show that computing closest flip points allows us to systematically investigate the model, identify decision boundaries, interpret and audit the model with respect to individual inputs and entire datasets, and find vulnerability against adversarial attacks. We demonstrate that flip points can help identify mistakes made by a model, improve the model’s accuracy, and reveal the most influential features for classifications.