Can You Trust This Prediction? Auditing Pointwise Reliability After Learning

Peter Schulam, Suchi Saria
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:1022-1031, 2019.

Abstract

To use machine learning in high stakes applications (e.g. medicine), we need tools for building confidence in the system and evaluating whether it is reliable. Methods to improve model reliability often require new learning algorithms (e.g. using Bayesian inference to obtain uncertainty estimates). An alternative is to audit a model after it is trained. In this paper, we describe resampling uncertainty estimation (RUE), an algorithm to audit the pointwise reliability of predictions. Intuitively, RUE estimates the amount that a prediction would change if the model had been fit on different training data. The algorithm uses the gradient and Hessian of the model’s loss function to create an ensemble of predictions. Experimentally, we show that RUE more effectively detects inaccurate predictions than existing tools for auditing reliability subsequent to training. We also show that RUE can create predictive distributions that are competitive with state-of-the-art methods like Monte Carlo dropout, probabilistic backpropagation, and deep ensembles, but does not depend on specific algorithms at train-time like these methods do.

Cite this Paper


BibTeX
@InProceedings{pmlr-v89-schulam19a, title = {Can You Trust This Prediction? Auditing Pointwise Reliability After Learning}, author = {Schulam, Peter and Saria, Suchi}, booktitle = {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics}, pages = {1022--1031}, year = {2019}, editor = {Chaudhuri, Kamalika and Sugiyama, Masashi}, volume = {89}, series = {Proceedings of Machine Learning Research}, month = {16--18 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v89/schulam19a/schulam19a.pdf}, url = {https://proceedings.mlr.press/v89/schulam19a.html}, abstract = {To use machine learning in high stakes applications (e.g. medicine), we need tools for building confidence in the system and evaluating whether it is reliable. Methods to improve model reliability often require new learning algorithms (e.g. using Bayesian inference to obtain uncertainty estimates). An alternative is to audit a model after it is trained. In this paper, we describe resampling uncertainty estimation (RUE), an algorithm to audit the pointwise reliability of predictions. Intuitively, RUE estimates the amount that a prediction would change if the model had been fit on different training data. The algorithm uses the gradient and Hessian of the model’s loss function to create an ensemble of predictions. Experimentally, we show that RUE more effectively detects inaccurate predictions than existing tools for auditing reliability subsequent to training. We also show that RUE can create predictive distributions that are competitive with state-of-the-art methods like Monte Carlo dropout, probabilistic backpropagation, and deep ensembles, but does not depend on specific algorithms at train-time like these methods do.} }
Endnote
%0 Conference Paper %T Can You Trust This Prediction? Auditing Pointwise Reliability After Learning %A Peter Schulam %A Suchi Saria %B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Masashi Sugiyama %F pmlr-v89-schulam19a %I PMLR %P 1022--1031 %U https://proceedings.mlr.press/v89/schulam19a.html %V 89 %X To use machine learning in high stakes applications (e.g. medicine), we need tools for building confidence in the system and evaluating whether it is reliable. Methods to improve model reliability often require new learning algorithms (e.g. using Bayesian inference to obtain uncertainty estimates). An alternative is to audit a model after it is trained. In this paper, we describe resampling uncertainty estimation (RUE), an algorithm to audit the pointwise reliability of predictions. Intuitively, RUE estimates the amount that a prediction would change if the model had been fit on different training data. The algorithm uses the gradient and Hessian of the model’s loss function to create an ensemble of predictions. Experimentally, we show that RUE more effectively detects inaccurate predictions than existing tools for auditing reliability subsequent to training. We also show that RUE can create predictive distributions that are competitive with state-of-the-art methods like Monte Carlo dropout, probabilistic backpropagation, and deep ensembles, but does not depend on specific algorithms at train-time like these methods do.
APA
Schulam, P. & Saria, S.. (2019). Can You Trust This Prediction? Auditing Pointwise Reliability After Learning. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:1022-1031 Available from https://proceedings.mlr.press/v89/schulam19a.html.

Related Material