Evaluating Approximate Inference in Bayesian Deep Learning

Andrew Gordon Wilson, Pavel Izmailov, Matthew D Hoffman, Yarin Gal, Yingzhen Li, Melanie F Pradier, Sharad Vikram, Andrew Foong, Sanae Lotfi, Sebastian Farquhar
Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, PMLR 176:113-124, 2022.

Abstract

Uncertainty representation is crucial to the safe and reliable deployment of deep learning. Bayesian methods provide a natural mechanism to represent epistemic uncertainty, leading to improved generalization and calibrated predictive distributions. Understanding the fidelity of approximate inference has extraordinary value beyond the standard approach of measuring generalization on a particular task: if approximate inference is working correctly, then we can expect more reliable and accurate deployment across any number of real-world settings. In this competition, we evaluate the fidelity of approximate Bayesian inference procedures in deep learning, using as a reference Hamiltonian Monte Carlo (HMC) samples obtained by parallelizing computations over hundreds of tensor processing unit (TPU) devices. We consider a variety of tasks, including image recognition, regression, covariate shift, and medical applications. All data are publicly available, and we release several baselines, including stochastic MCMC, variational methods, and deep ensembles. The competition resulted in hundreds of submissions across many teams. The winning entries all involved novel multi-modal posterior approximations, highlighting the relative importance of representing multiple modes, and suggesting that we should not consider deep ensembles a {“}non-Bayesian{”} alternative to standard unimodal approximations. In the future, the competition will provide a foundation for innovation and continued benchmarking of approximate Bayesian inference procedures in deep learning. The HMC samples will remain available through the competition website.

Cite this Paper


BibTeX
@InProceedings{pmlr-v176-wilson22a, title = {Evaluating Approximate Inference in Bayesian Deep Learning}, author = {Wilson, Andrew Gordon and Izmailov, Pavel and Hoffman, Matthew D and Gal, Yarin and Li, Yingzhen and Pradier, Melanie F and Vikram, Sharad and Foong, Andrew and Lotfi, Sanae and Farquhar, Sebastian}, booktitle = {Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track}, pages = {113--124}, year = {2022}, editor = {Kiela, Douwe and Ciccone, Marco and Caputo, Barbara}, volume = {176}, series = {Proceedings of Machine Learning Research}, month = {06--14 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v176/wilson22a/wilson22a.pdf}, url = {https://proceedings.mlr.press/v176/wilson22a.html}, abstract = {Uncertainty representation is crucial to the safe and reliable deployment of deep learning. Bayesian methods provide a natural mechanism to represent epistemic uncertainty, leading to improved generalization and calibrated predictive distributions. Understanding the fidelity of approximate inference has extraordinary value beyond the standard approach of measuring generalization on a particular task: if approximate inference is working correctly, then we can expect more reliable and accurate deployment across any number of real-world settings. In this competition, we evaluate the fidelity of approximate Bayesian inference procedures in deep learning, using as a reference Hamiltonian Monte Carlo (HMC) samples obtained by parallelizing computations over hundreds of tensor processing unit (TPU) devices. We consider a variety of tasks, including image recognition, regression, covariate shift, and medical applications. All data are publicly available, and we release several baselines, including stochastic MCMC, variational methods, and deep ensembles. The competition resulted in hundreds of submissions across many teams. The winning entries all involved novel multi-modal posterior approximations, highlighting the relative importance of representing multiple modes, and suggesting that we should not consider deep ensembles a {“}non-Bayesian{”} alternative to standard unimodal approximations. In the future, the competition will provide a foundation for innovation and continued benchmarking of approximate Bayesian inference procedures in deep learning. The HMC samples will remain available through the competition website.} }
Endnote
%0 Conference Paper %T Evaluating Approximate Inference in Bayesian Deep Learning %A Andrew Gordon Wilson %A Pavel Izmailov %A Matthew D Hoffman %A Yarin Gal %A Yingzhen Li %A Melanie F Pradier %A Sharad Vikram %A Andrew Foong %A Sanae Lotfi %A Sebastian Farquhar %B Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track %C Proceedings of Machine Learning Research %D 2022 %E Douwe Kiela %E Marco Ciccone %E Barbara Caputo %F pmlr-v176-wilson22a %I PMLR %P 113--124 %U https://proceedings.mlr.press/v176/wilson22a.html %V 176 %X Uncertainty representation is crucial to the safe and reliable deployment of deep learning. Bayesian methods provide a natural mechanism to represent epistemic uncertainty, leading to improved generalization and calibrated predictive distributions. Understanding the fidelity of approximate inference has extraordinary value beyond the standard approach of measuring generalization on a particular task: if approximate inference is working correctly, then we can expect more reliable and accurate deployment across any number of real-world settings. In this competition, we evaluate the fidelity of approximate Bayesian inference procedures in deep learning, using as a reference Hamiltonian Monte Carlo (HMC) samples obtained by parallelizing computations over hundreds of tensor processing unit (TPU) devices. We consider a variety of tasks, including image recognition, regression, covariate shift, and medical applications. All data are publicly available, and we release several baselines, including stochastic MCMC, variational methods, and deep ensembles. The competition resulted in hundreds of submissions across many teams. The winning entries all involved novel multi-modal posterior approximations, highlighting the relative importance of representing multiple modes, and suggesting that we should not consider deep ensembles a {“}non-Bayesian{”} alternative to standard unimodal approximations. In the future, the competition will provide a foundation for innovation and continued benchmarking of approximate Bayesian inference procedures in deep learning. The HMC samples will remain available through the competition website.
APA
Wilson, A.G., Izmailov, P., Hoffman, M.D., Gal, Y., Li, Y., Pradier, M.F., Vikram, S., Foong, A., Lotfi, S. & Farquhar, S.. (2022). Evaluating Approximate Inference in Bayesian Deep Learning. Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, in Proceedings of Machine Learning Research 176:113-124 Available from https://proceedings.mlr.press/v176/wilson22a.html.

Related Material