SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics

Emmanuel Abbe, Enric Boix Adserà, Theodor Misiakiewicz
Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:2552-2623, 2023.

Abstract

We investigate the time complexity of SGD learning on fully-connected neural networks with isotropic data. We put forward a complexity measure,{\it the leap}, which measures how “hierarchical” target functions are. For $d$-dimensional uniform Boolean or isotropic Gaussian data, our main conjecture states that the time complexity to learn a function $f$ with low-dimensional support is $$\Tilde \Theta (d^{\max(\mathrm{Leap}(f),2)}) \,\,.$$ We prove a version of this conjecture for a class of functions on Gaussian isotropic data and 2-layer neural networks, under additional technical assumptions on how SGD is run. We show that the training sequentially learns the function support with a saddle-to-saddle dynamic. Our result departs from Abbe et al.’22 by going beyond leap 1 (merged-staircase functions), and by going beyond the mean-field and gradient flow approximations that prohibit the full complexity control obtained here.Finally, we note that this gives an SGD complexity for the full training trajectory that matches that of Correlational Statistical Query (CSQ) lower-bounds.

Cite this Paper


BibTeX
@InProceedings{pmlr-v195-abbe23a, title = {SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics}, author = {Abbe, Emmanuel and Adser{\`a}, Enric Boix and Misiakiewicz, Theodor}, booktitle = {Proceedings of Thirty Sixth Conference on Learning Theory}, pages = {2552--2623}, year = {2023}, editor = {Neu, Gergely and Rosasco, Lorenzo}, volume = {195}, series = {Proceedings of Machine Learning Research}, month = {12--15 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v195/abbe23a/abbe23a.pdf}, url = {https://proceedings.mlr.press/v195/abbe23a.html}, abstract = { We investigate the time complexity of SGD learning on fully-connected neural networks with isotropic data. We put forward a complexity measure,{\it the leap}, which measures how “hierarchical” target functions are. For $d$-dimensional uniform Boolean or isotropic Gaussian data, our main conjecture states that the time complexity to learn a function $f$ with low-dimensional support is $$\Tilde \Theta (d^{\max(\mathrm{Leap}(f),2)}) \,\,.$$ We prove a version of this conjecture for a class of functions on Gaussian isotropic data and 2-layer neural networks, under additional technical assumptions on how SGD is run. We show that the training sequentially learns the function support with a saddle-to-saddle dynamic. Our result departs from Abbe et al.’22 by going beyond leap 1 (merged-staircase functions), and by going beyond the mean-field and gradient flow approximations that prohibit the full complexity control obtained here.Finally, we note that this gives an SGD complexity for the full training trajectory that matches that of Correlational Statistical Query (CSQ) lower-bounds. } }
Endnote
%0 Conference Paper %T SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics %A Emmanuel Abbe %A Enric Boix Adserà %A Theodor Misiakiewicz %B Proceedings of Thirty Sixth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2023 %E Gergely Neu %E Lorenzo Rosasco %F pmlr-v195-abbe23a %I PMLR %P 2552--2623 %U https://proceedings.mlr.press/v195/abbe23a.html %V 195 %X We investigate the time complexity of SGD learning on fully-connected neural networks with isotropic data. We put forward a complexity measure,{\it the leap}, which measures how “hierarchical” target functions are. For $d$-dimensional uniform Boolean or isotropic Gaussian data, our main conjecture states that the time complexity to learn a function $f$ with low-dimensional support is $$\Tilde \Theta (d^{\max(\mathrm{Leap}(f),2)}) \,\,.$$ We prove a version of this conjecture for a class of functions on Gaussian isotropic data and 2-layer neural networks, under additional technical assumptions on how SGD is run. We show that the training sequentially learns the function support with a saddle-to-saddle dynamic. Our result departs from Abbe et al.’22 by going beyond leap 1 (merged-staircase functions), and by going beyond the mean-field and gradient flow approximations that prohibit the full complexity control obtained here.Finally, we note that this gives an SGD complexity for the full training trajectory that matches that of Correlational Statistical Query (CSQ) lower-bounds.
APA
Abbe, E., Adserà, E.B. & Misiakiewicz, T.. (2023). SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics. Proceedings of Thirty Sixth Conference on Learning Theory, in Proceedings of Machine Learning Research 195:2552-2623 Available from https://proceedings.mlr.press/v195/abbe23a.html.

Related Material