Embedded Ensembles: infinite width limit and operating regimes

Maksim Velikanov, Roman V. Kail, Ivan Anokhin, Roman Vashurin, Maxim Panov, Alexey Zaytsev, Dmitry Yarotsky
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:3138-3163, 2022.

Abstract

A memory efficient approach to ensembling neural networks is to share most weights among the ensembled models by means of a single reference network. We refer to this strategy as Embedded Ensembling (EE); its particular examples are BatchEnsembles and Monte-Carlo dropout ensembles. In this paper we perform a systematic theoretical and empirical analysis of embedded ensembles with different number of models. Theoretically, we use a Neural-Tangent-Kernel-based approach to derive the wide network limit of the gradient descent dynamics. In this limit, we identify two ensemble regimes - independent and collective - depending on the architecture and initialization strategy of ensemble models. We prove that in the independent regime the embedded ensemble behaves as an ensemble of independent models. We confirm our theoretical prediction with a wide range of experiments with finite networks, and further study empirically various effects such as transition between the two regimes, scaling of ensemble performance with the network width and number of models, and dependence of performance on a number of architecture and hyperparameter choices.

Cite this Paper


BibTeX
@InProceedings{pmlr-v151-velikanov22a, title = { Embedded Ensembles: infinite width limit and operating regimes }, author = {Velikanov, Maksim and Kail, Roman V. and Anokhin, Ivan and Vashurin, Roman and Panov, Maxim and Zaytsev, Alexey and Yarotsky, Dmitry}, booktitle = {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics}, pages = {3138--3163}, year = {2022}, editor = {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel}, volume = {151}, series = {Proceedings of Machine Learning Research}, month = {28--30 Mar}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v151/velikanov22a/velikanov22a.pdf}, url = {https://proceedings.mlr.press/v151/velikanov22a.html}, abstract = { A memory efficient approach to ensembling neural networks is to share most weights among the ensembled models by means of a single reference network. We refer to this strategy as Embedded Ensembling (EE); its particular examples are BatchEnsembles and Monte-Carlo dropout ensembles. In this paper we perform a systematic theoretical and empirical analysis of embedded ensembles with different number of models. Theoretically, we use a Neural-Tangent-Kernel-based approach to derive the wide network limit of the gradient descent dynamics. In this limit, we identify two ensemble regimes - independent and collective - depending on the architecture and initialization strategy of ensemble models. We prove that in the independent regime the embedded ensemble behaves as an ensemble of independent models. We confirm our theoretical prediction with a wide range of experiments with finite networks, and further study empirically various effects such as transition between the two regimes, scaling of ensemble performance with the network width and number of models, and dependence of performance on a number of architecture and hyperparameter choices. } }
Endnote
%0 Conference Paper %T Embedded Ensembles: infinite width limit and operating regimes %A Maksim Velikanov %A Roman V. Kail %A Ivan Anokhin %A Roman Vashurin %A Maxim Panov %A Alexey Zaytsev %A Dmitry Yarotsky %B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2022 %E Gustau Camps-Valls %E Francisco J. R. Ruiz %E Isabel Valera %F pmlr-v151-velikanov22a %I PMLR %P 3138--3163 %U https://proceedings.mlr.press/v151/velikanov22a.html %V 151 %X A memory efficient approach to ensembling neural networks is to share most weights among the ensembled models by means of a single reference network. We refer to this strategy as Embedded Ensembling (EE); its particular examples are BatchEnsembles and Monte-Carlo dropout ensembles. In this paper we perform a systematic theoretical and empirical analysis of embedded ensembles with different number of models. Theoretically, we use a Neural-Tangent-Kernel-based approach to derive the wide network limit of the gradient descent dynamics. In this limit, we identify two ensemble regimes - independent and collective - depending on the architecture and initialization strategy of ensemble models. We prove that in the independent regime the embedded ensemble behaves as an ensemble of independent models. We confirm our theoretical prediction with a wide range of experiments with finite networks, and further study empirically various effects such as transition between the two regimes, scaling of ensemble performance with the network width and number of models, and dependence of performance on a number of architecture and hyperparameter choices.
APA
Velikanov, M., Kail, R.V., Anokhin, I., Vashurin, R., Panov, M., Zaytsev, A. & Yarotsky, D.. (2022). Embedded Ensembles: infinite width limit and operating regimes . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:3138-3163 Available from https://proceedings.mlr.press/v151/velikanov22a.html.

Related Material