Global inducing point variational posteriors for Bayesian neural networks and deep Gaussian processes

Sebastian W Ober, Laurence Aitchison
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:8248-8259, 2021.

Abstract

We consider the optimal approximate posterior over the top-layer weights in a Bayesian neural network for regression, and show that it exhibits strong dependencies on the lower-layer weights. We adapt this result to develop a correlated approximate posterior over the weights at all layers in a Bayesian neural network. We extend this approach to deep Gaussian processes, unifying inference in the two model classes. Our approximate posterior uses learned "global” inducing points, which are defined only at the input layer and propagated through the network to obtain inducing inputs at subsequent layers. By contrast, standard, "local”, inducing point methods from the deep Gaussian process literature optimise a separate set of inducing inputs at every layer, and thus do not model correlations across layers. Our method gives state-of-the-art performance for a variational Bayesian method, without data augmentation or tempering, on CIFAR-10 of 86.7%, which is comparable to SGMCMC without tempering but with data augmentation (88% in Wenzel et al. 2020).

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-ober21a, title = {Global inducing point variational posteriors for Bayesian neural networks and deep Gaussian processes}, author = {Ober, Sebastian W and Aitchison, Laurence}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {8248--8259}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/ober21a/ober21a.pdf}, url = {https://proceedings.mlr.press/v139/ober21a.html}, abstract = {We consider the optimal approximate posterior over the top-layer weights in a Bayesian neural network for regression, and show that it exhibits strong dependencies on the lower-layer weights. We adapt this result to develop a correlated approximate posterior over the weights at all layers in a Bayesian neural network. We extend this approach to deep Gaussian processes, unifying inference in the two model classes. Our approximate posterior uses learned "global” inducing points, which are defined only at the input layer and propagated through the network to obtain inducing inputs at subsequent layers. By contrast, standard, "local”, inducing point methods from the deep Gaussian process literature optimise a separate set of inducing inputs at every layer, and thus do not model correlations across layers. Our method gives state-of-the-art performance for a variational Bayesian method, without data augmentation or tempering, on CIFAR-10 of 86.7%, which is comparable to SGMCMC without tempering but with data augmentation (88% in Wenzel et al. 2020).} }
Endnote
%0 Conference Paper %T Global inducing point variational posteriors for Bayesian neural networks and deep Gaussian processes %A Sebastian W Ober %A Laurence Aitchison %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-ober21a %I PMLR %P 8248--8259 %U https://proceedings.mlr.press/v139/ober21a.html %V 139 %X We consider the optimal approximate posterior over the top-layer weights in a Bayesian neural network for regression, and show that it exhibits strong dependencies on the lower-layer weights. We adapt this result to develop a correlated approximate posterior over the weights at all layers in a Bayesian neural network. We extend this approach to deep Gaussian processes, unifying inference in the two model classes. Our approximate posterior uses learned "global” inducing points, which are defined only at the input layer and propagated through the network to obtain inducing inputs at subsequent layers. By contrast, standard, "local”, inducing point methods from the deep Gaussian process literature optimise a separate set of inducing inputs at every layer, and thus do not model correlations across layers. Our method gives state-of-the-art performance for a variational Bayesian method, without data augmentation or tempering, on CIFAR-10 of 86.7%, which is comparable to SGMCMC without tempering but with data augmentation (88% in Wenzel et al. 2020).
APA
Ober, S.W. & Aitchison, L.. (2021). Global inducing point variational posteriors for Bayesian neural networks and deep Gaussian processes. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:8248-8259 Available from https://proceedings.mlr.press/v139/ober21a.html.

Related Material