Law of Large Numbers for Bayesian two-layer Neural Network trained with Variational Inference

Arnaud Descours; Tom Huix; Arnaud Guillin; Manon Michel; Éric Moulines; Boris Nectoux

Law of Large Numbers for Bayesian two-layer Neural Network trained with Variational Inference

Arnaud Descours, Tom Huix, Arnaud Guillin, Manon Michel, Éric Moulines, Boris Nectoux

Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:4657-4695, 2023.

Abstract

We provide a rigorous analysis of training by variational inference (VI) of Bayesian neural networks in the two-layer and infinite-width case. We consider a regression problem with a regularized evidence lower bound (ELBO) which is decomposed into the expected log-likelihood of the data and the Kullback-Leibler (KL) divergence between the a priori distribution and the variational posterior. With an appropriate weighting of the KL, we prove a law of large numbers for three different training schemes: (i) the idealized case with exact estimation of a multiple Gaussian integral from the reparametrization trick, (ii) a minibatch scheme using Monte Carlo sampling, commonly known as Bayes by Backprop, and (iii) a new and computationally cheaper algorithm which we introduce as Minimal VI. An important result is that all methods converge to the same mean-field limit. Finally, we illustrate our results numerically and discuss the need for the derivation of a central limit theorem.

Cite this Paper

BibTeX


@InProceedings{pmlr-v195-descours23a,
  title = 	 {Law of Large Numbers for Bayesian two-layer Neural Network trained with Variational Inference},
  author =       {Descours, Arnaud and Huix, Tom and Guillin, Arnaud and Michel, Manon and Moulines, {\'E}ric and Nectoux, Boris},
  booktitle = 	 {Proceedings of Thirty Sixth Conference on Learning Theory},
  pages = 	 {4657--4695},
  year = 	 {2023},
  editor = 	 {Neu, Gergely and Rosasco, Lorenzo},
  volume = 	 {195},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {12--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v195/descours23a/descours23a.pdf},
  url = 	 {https://proceedings.mlr.press/v195/descours23a.html},
  abstract = 	 {We provide a rigorous analysis of training by variational inference (VI) of Bayesian neural networks in the two-layer and infinite-width case. We consider a regression problem with a regularized evidence lower bound (ELBO) which is decomposed into the expected log-likelihood of the data and the Kullback-Leibler (KL) divergence between the a priori distribution and the variational posterior. With an appropriate weighting of the KL, we prove a law of large numbers for three different training schemes: (i) the idealized case with exact estimation of a multiple Gaussian integral from the reparametrization trick, (ii) a minibatch scheme using Monte Carlo sampling, commonly known as Bayes by Backprop, and (iii) a new and computationally cheaper algorithm which we introduce as Minimal VI. An important result is that all methods converge to the same mean-field limit. Finally, we illustrate our results numerically and discuss the need for the derivation of a central limit theorem.}
}

Endnote

%0 Conference Paper
%T Law of Large Numbers for Bayesian two-layer Neural Network trained with Variational Inference
%A Arnaud Descours
%A Tom Huix
%A Arnaud Guillin
%A Manon Michel
%A Éric Moulines
%A Boris Nectoux
%B Proceedings of Thirty Sixth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2023
%E Gergely Neu
%E Lorenzo Rosasco	
%F pmlr-v195-descours23a
%I PMLR
%P 4657--4695
%U https://proceedings.mlr.press/v195/descours23a.html
%V 195
%X We provide a rigorous analysis of training by variational inference (VI) of Bayesian neural networks in the two-layer and infinite-width case. We consider a regression problem with a regularized evidence lower bound (ELBO) which is decomposed into the expected log-likelihood of the data and the Kullback-Leibler (KL) divergence between the a priori distribution and the variational posterior. With an appropriate weighting of the KL, we prove a law of large numbers for three different training schemes: (i) the idealized case with exact estimation of a multiple Gaussian integral from the reparametrization trick, (ii) a minibatch scheme using Monte Carlo sampling, commonly known as Bayes by Backprop, and (iii) a new and computationally cheaper algorithm which we introduce as Minimal VI. An important result is that all methods converge to the same mean-field limit. Finally, we illustrate our results numerically and discuss the need for the derivation of a central limit theorem.

APA


Descours, A., Huix, T., Guillin, A., Michel, M., Moulines, É. & Nectoux, B.. (2023). Law of Large Numbers for Bayesian two-layer Neural Network trained with Variational Inference. Proceedings of Thirty Sixth Conference on Learning Theory, in Proceedings of Machine Learning Research 195:4657-4695 Available from https://proceedings.mlr.press/v195/descours23a.html.

Related Material

Download PDF