Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling

Jiri Hron; Roman Novak; Jeffrey Pennington; Jascha Sohl-Dickstein

Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling

Jiri Hron, Roman Novak, Jeffrey Pennington, Jascha Sohl-Dickstein

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:8926-8945, 2022.

Abstract

We introduce repriorisation, a data-dependent reparameterisation which transforms a Bayesian neural network (BNN) posterior to a distribution whose KL divergence to the BNN prior vanishes as layer widths grow. The repriorisation map acts directly on parameters, and its analytic simplicity complements the known neural network Gaussian process (NNGP) behaviour of wide BNNs in function space. Exploiting the repriorisation, we develop a Markov chain Monte Carlo (MCMC) posterior sampling algorithm which mixes faster the wider the BNN. This contrasts with the typically poor performance of MCMC in high dimensions. We observe up to 50x higher effective sample size relative to no reparametrisation for both fully-connected and residual networks. Improvements are achieved at all widths, with the margin between reparametrised and standard BNNs growing with layer width.

Cite this Paper

BibTeX

@InProceedings{pmlr-v162-hron22a,
  title = 	 {Wide {B}ayesian neural networks have a simple weight posterior: theory and accelerated sampling},
  author =       {Hron, Jiri and Novak, Roman and Pennington, Jeffrey and Sohl-Dickstein, Jascha},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {8926--8945},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/hron22a/hron22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/hron22a.html},
  abstract = 	 {We introduce repriorisation, a data-dependent reparameterisation which transforms a Bayesian neural network (BNN) posterior to a distribution whose KL divergence to the BNN prior vanishes as layer widths grow. The repriorisation map acts directly on parameters, and its analytic simplicity complements the known neural network Gaussian process (NNGP) behaviour of wide BNNs in function space. Exploiting the repriorisation, we develop a Markov chain Monte Carlo (MCMC) posterior sampling algorithm which mixes faster the wider the BNN. This contrasts with the typically poor performance of MCMC in high dimensions. We observe up to 50x higher effective sample size relative to no reparametrisation for both fully-connected and residual networks. Improvements are achieved at all widths, with the margin between reparametrised and standard BNNs growing with layer width.}
}

Endnote

%0 Conference Paper
%T Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling
%A Jiri Hron
%A Roman Novak
%A Jeffrey Pennington
%A Jascha Sohl-Dickstein
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-hron22a
%I PMLR
%P 8926--8945
%U https://proceedings.mlr.press/v162/hron22a.html
%V 162
%X We introduce repriorisation, a data-dependent reparameterisation which transforms a Bayesian neural network (BNN) posterior to a distribution whose KL divergence to the BNN prior vanishes as layer widths grow. The repriorisation map acts directly on parameters, and its analytic simplicity complements the known neural network Gaussian process (NNGP) behaviour of wide BNNs in function space. Exploiting the repriorisation, we develop a Markov chain Monte Carlo (MCMC) posterior sampling algorithm which mixes faster the wider the BNN. This contrasts with the typically poor performance of MCMC in high dimensions. We observe up to 50x higher effective sample size relative to no reparametrisation for both fully-connected and residual networks. Improvements are achieved at all widths, with the margin between reparametrised and standard BNNs growing with layer width.

APA

Hron, J., Novak, R., Pennington, J. & Sohl-Dickstein, J.. (2022). Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:8926-8945 Available from https://proceedings.mlr.press/v162/hron22a.html.

Related Material

Download PDF