From high-dimensional & mean-field dynamics to dimensionless ODEs: A unifying approach to SGD in two-layers networks

Luca Arnaboldi, Ludovic Stephan, Florent Krzakala, Bruno Loureiro
Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:1199-1227, 2023.

Abstract

This manuscript investigates the one-pass stochastic gradient descent (SGD) dynamics of a two-layer neural network trained on Gaussian data and labels generated by a similar, though not necessarily identical, target function. We rigorously analyse the limiting dynamics via a deterministic and low-dimensional description in terms of the sufficient statistics for the population risk. Our unifying analysis bridges different regimes of interest, such as the classical gradient-flow regime of vanishing learning rate, the high-dimensional regime of large input dimension, and the overparameterised “mean-field” regime of large network width, covering as well the intermediate regimes where the limiting dynamics is determined by the interplay between these behaviours. In particular, in the high-dimensional limit, the infinite-width dynamics is found to remain close to a low-dimensional subspace spanned by the target principal directions. Our results therefore provide a unifying picture of the limiting SGD dynamics with synthetic data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v195-arnaboldi23a, title = {From high-dimensional & mean-field dynamics to dimensionless ODEs: A unifying approach to SGD in two-layers networks}, author = {Arnaboldi, Luca and Stephan, Ludovic and Krzakala, Florent and Loureiro, Bruno}, booktitle = {Proceedings of Thirty Sixth Conference on Learning Theory}, pages = {1199--1227}, year = {2023}, editor = {Neu, Gergely and Rosasco, Lorenzo}, volume = {195}, series = {Proceedings of Machine Learning Research}, month = {12--15 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v195/arnaboldi23a/arnaboldi23a.pdf}, url = {https://proceedings.mlr.press/v195/arnaboldi23a.html}, abstract = {This manuscript investigates the one-pass stochastic gradient descent (SGD) dynamics of a two-layer neural network trained on Gaussian data and labels generated by a similar, though not necessarily identical, target function. We rigorously analyse the limiting dynamics via a deterministic and low-dimensional description in terms of the sufficient statistics for the population risk. Our unifying analysis bridges different regimes of interest, such as the classical gradient-flow regime of vanishing learning rate, the high-dimensional regime of large input dimension, and the overparameterised “mean-field” regime of large network width, covering as well the intermediate regimes where the limiting dynamics is determined by the interplay between these behaviours. In particular, in the high-dimensional limit, the infinite-width dynamics is found to remain close to a low-dimensional subspace spanned by the target principal directions. Our results therefore provide a unifying picture of the limiting SGD dynamics with synthetic data. } }
Endnote
%0 Conference Paper %T From high-dimensional & mean-field dynamics to dimensionless ODEs: A unifying approach to SGD in two-layers networks %A Luca Arnaboldi %A Ludovic Stephan %A Florent Krzakala %A Bruno Loureiro %B Proceedings of Thirty Sixth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2023 %E Gergely Neu %E Lorenzo Rosasco %F pmlr-v195-arnaboldi23a %I PMLR %P 1199--1227 %U https://proceedings.mlr.press/v195/arnaboldi23a.html %V 195 %X This manuscript investigates the one-pass stochastic gradient descent (SGD) dynamics of a two-layer neural network trained on Gaussian data and labels generated by a similar, though not necessarily identical, target function. We rigorously analyse the limiting dynamics via a deterministic and low-dimensional description in terms of the sufficient statistics for the population risk. Our unifying analysis bridges different regimes of interest, such as the classical gradient-flow regime of vanishing learning rate, the high-dimensional regime of large input dimension, and the overparameterised “mean-field” regime of large network width, covering as well the intermediate regimes where the limiting dynamics is determined by the interplay between these behaviours. In particular, in the high-dimensional limit, the infinite-width dynamics is found to remain close to a low-dimensional subspace spanned by the target principal directions. Our results therefore provide a unifying picture of the limiting SGD dynamics with synthetic data.
APA
Arnaboldi, L., Stephan, L., Krzakala, F. & Loureiro, B.. (2023). From high-dimensional & mean-field dynamics to dimensionless ODEs: A unifying approach to SGD in two-layers networks. Proceedings of Thirty Sixth Conference on Learning Theory, in Proceedings of Machine Learning Research 195:1199-1227 Available from https://proceedings.mlr.press/v195/arnaboldi23a.html.

Related Material