Law of Large Numbers for Bayesian two-layer Neural Network trained with Variational Inference

Arnaud Descours, Tom Huix, Arnaud Guillin, Manon Michel, Éric Moulines, Boris Nectoux
Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:4657-4695, 2023.

Abstract

We provide a rigorous analysis of training by variational inference (VI) of Bayesian neural networks in the two-layer and infinite-width case. We consider a regression problem with a regularized evidence lower bound (ELBO) which is decomposed into the expected log-likelihood of the data and the Kullback-Leibler (KL) divergence between the a priori distribution and the variational posterior. With an appropriate weighting of the KL, we prove a law of large numbers for three different training schemes: (i) the idealized case with exact estimation of a multiple Gaussian integral from the reparametrization trick, (ii) a minibatch scheme using Monte Carlo sampling, commonly known as Bayes by Backprop, and (iii) a new and computationally cheaper algorithm which we introduce as Minimal VI. An important result is that all methods converge to the same mean-field limit. Finally, we illustrate our results numerically and discuss the need for the derivation of a central limit theorem.

Cite this Paper


BibTeX
@InProceedings{pmlr-v195-descours23a, title = {Law of Large Numbers for Bayesian two-layer Neural Network trained with Variational Inference}, author = {Descours, Arnaud and Huix, Tom and Guillin, Arnaud and Michel, Manon and Moulines, {\'E}ric and Nectoux, Boris}, booktitle = {Proceedings of Thirty Sixth Conference on Learning Theory}, pages = {4657--4695}, year = {2023}, editor = {Neu, Gergely and Rosasco, Lorenzo}, volume = {195}, series = {Proceedings of Machine Learning Research}, month = {12--15 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v195/descours23a/descours23a.pdf}, url = {https://proceedings.mlr.press/v195/descours23a.html}, abstract = {We provide a rigorous analysis of training by variational inference (VI) of Bayesian neural networks in the two-layer and infinite-width case. We consider a regression problem with a regularized evidence lower bound (ELBO) which is decomposed into the expected log-likelihood of the data and the Kullback-Leibler (KL) divergence between the a priori distribution and the variational posterior. With an appropriate weighting of the KL, we prove a law of large numbers for three different training schemes: (i) the idealized case with exact estimation of a multiple Gaussian integral from the reparametrization trick, (ii) a minibatch scheme using Monte Carlo sampling, commonly known as Bayes by Backprop, and (iii) a new and computationally cheaper algorithm which we introduce as Minimal VI. An important result is that all methods converge to the same mean-field limit. Finally, we illustrate our results numerically and discuss the need for the derivation of a central limit theorem.} }
Endnote
%0 Conference Paper %T Law of Large Numbers for Bayesian two-layer Neural Network trained with Variational Inference %A Arnaud Descours %A Tom Huix %A Arnaud Guillin %A Manon Michel %A Éric Moulines %A Boris Nectoux %B Proceedings of Thirty Sixth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2023 %E Gergely Neu %E Lorenzo Rosasco %F pmlr-v195-descours23a %I PMLR %P 4657--4695 %U https://proceedings.mlr.press/v195/descours23a.html %V 195 %X We provide a rigorous analysis of training by variational inference (VI) of Bayesian neural networks in the two-layer and infinite-width case. We consider a regression problem with a regularized evidence lower bound (ELBO) which is decomposed into the expected log-likelihood of the data and the Kullback-Leibler (KL) divergence between the a priori distribution and the variational posterior. With an appropriate weighting of the KL, we prove a law of large numbers for three different training schemes: (i) the idealized case with exact estimation of a multiple Gaussian integral from the reparametrization trick, (ii) a minibatch scheme using Monte Carlo sampling, commonly known as Bayes by Backprop, and (iii) a new and computationally cheaper algorithm which we introduce as Minimal VI. An important result is that all methods converge to the same mean-field limit. Finally, we illustrate our results numerically and discuss the need for the derivation of a central limit theorem.
APA
Descours, A., Huix, T., Guillin, A., Michel, M., Moulines, É. & Nectoux, B.. (2023). Law of Large Numbers for Bayesian two-layer Neural Network trained with Variational Inference. Proceedings of Thirty Sixth Conference on Learning Theory, in Proceedings of Machine Learning Research 195:4657-4695 Available from https://proceedings.mlr.press/v195/descours23a.html.

Related Material