Mean-field analysis of polynomial-width two-layer neural network beyond finite time horizon

Margalit Glasgow, Denny Wu, Joan Bruna
Proceedings of Thirty Eighth Conference on Learning Theory, PMLR 291:2461-2539, 2025.

Abstract

We study the approximation gap between the dynamics of a polynomial-width neural network and its infinite-width counterpart, both trained using projected gradient descent in the mean-field scaling regime. We demonstrate how to tightly bound this approximation gap through a differential equation governed by the mean-field dynamics. A key factor influencing the growth of this ODE is the local Hessian of each particle, defined as the derivative of the particle’s velocity in the mean- field dynamics with respect to its position. We apply our results to the canonical feature learning problem of estimating a well-specified single-index model; we permit the information exponent to be arbitrarily large, leading to convergence times that grow polynomially in the ambient dimension d. We show that, due to a certain "self-concordance" property in these problems - where the local Hessian of a particle is bounded by a constant times the particle’s velocity - polynomially many neurons are sufficient to closely approximate the mean-field dynamics throughout training.

Cite this Paper


BibTeX
@InProceedings{pmlr-v291-glasgow25a, title = {Mean-field analysis of polynomial-width two-layer neural network beyond finite time horizon}, author = {Glasgow, Margalit and Wu, Denny and Bruna, Joan}, booktitle = {Proceedings of Thirty Eighth Conference on Learning Theory}, pages = {2461--2539}, year = {2025}, editor = {Haghtalab, Nika and Moitra, Ankur}, volume = {291}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--04 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v291/main/assets/glasgow25a/glasgow25a.pdf}, url = {https://proceedings.mlr.press/v291/glasgow25a.html}, abstract = {We study the approximation gap between the dynamics of a polynomial-width neural network and its infinite-width counterpart, both trained using projected gradient descent in the mean-field scaling regime. We demonstrate how to tightly bound this approximation gap through a differential equation governed by the mean-field dynamics. A key factor influencing the growth of this ODE is the local Hessian of each particle, defined as the derivative of the particle’s velocity in the mean- field dynamics with respect to its position. We apply our results to the canonical feature learning problem of estimating a well-specified single-index model; we permit the information exponent to be arbitrarily large, leading to convergence times that grow polynomially in the ambient dimension d. We show that, due to a certain "self-concordance" property in these problems - where the local Hessian of a particle is bounded by a constant times the particle’s velocity - polynomially many neurons are sufficient to closely approximate the mean-field dynamics throughout training.} }
Endnote
%0 Conference Paper %T Mean-field analysis of polynomial-width two-layer neural network beyond finite time horizon %A Margalit Glasgow %A Denny Wu %A Joan Bruna %B Proceedings of Thirty Eighth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2025 %E Nika Haghtalab %E Ankur Moitra %F pmlr-v291-glasgow25a %I PMLR %P 2461--2539 %U https://proceedings.mlr.press/v291/glasgow25a.html %V 291 %X We study the approximation gap between the dynamics of a polynomial-width neural network and its infinite-width counterpart, both trained using projected gradient descent in the mean-field scaling regime. We demonstrate how to tightly bound this approximation gap through a differential equation governed by the mean-field dynamics. A key factor influencing the growth of this ODE is the local Hessian of each particle, defined as the derivative of the particle’s velocity in the mean- field dynamics with respect to its position. We apply our results to the canonical feature learning problem of estimating a well-specified single-index model; we permit the information exponent to be arbitrarily large, leading to convergence times that grow polynomially in the ambient dimension d. We show that, due to a certain "self-concordance" property in these problems - where the local Hessian of a particle is bounded by a constant times the particle’s velocity - polynomially many neurons are sufficient to closely approximate the mean-field dynamics throughout training.
APA
Glasgow, M., Wu, D. & Bruna, J.. (2025). Mean-field analysis of polynomial-width two-layer neural network beyond finite time horizon. Proceedings of Thirty Eighth Conference on Learning Theory, in Proceedings of Machine Learning Research 291:2461-2539 Available from https://proceedings.mlr.press/v291/glasgow25a.html.

Related Material