The Dynamics of Gradient Descent for Overparametrized Neural Networks

Siddhartha Satpathi, R Srikant
Proceedings of the 3rd Conference on Learning for Dynamics and Control, PMLR 144:373-384, 2021.

Abstract

We consider the dynamics of gradient descent (GD) in overparameterized single hidden layer neural networks with a squared loss function. Recently, it has been shown that, under some conditions, the parameter values obtained using GD achieve zero training error and generalize well if the initial conditions are chosen appropriately. Here, through a Lyapunov analysis, we show that the dynamics of neural network weights under GD converge to a point which is close to the minimum norm solution subject to the condition that there is no training error when using the linear approximation to the neural network.

Cite this Paper


BibTeX
@InProceedings{pmlr-v144-satpathi21a, title = {The Dynamics of Gradient Descent for Overparametrized Neural Networks}, author = {Satpathi, Siddhartha and Srikant, R}, booktitle = {Proceedings of the 3rd Conference on Learning for Dynamics and Control}, pages = {373--384}, year = {2021}, editor = {Jadbabaie, Ali and Lygeros, John and Pappas, George J. and A. Parrilo, Pablo and Recht, Benjamin and Tomlin, Claire J. and Zeilinger, Melanie N.}, volume = {144}, series = {Proceedings of Machine Learning Research}, month = {07 -- 08 June}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v144/satpathi21a/satpathi21a.pdf}, url = {https://proceedings.mlr.press/v144/satpathi21a.html}, abstract = {We consider the dynamics of gradient descent (GD) in overparameterized single hidden layer neural networks with a squared loss function. Recently, it has been shown that, under some conditions, the parameter values obtained using GD achieve zero training error and generalize well if the initial conditions are chosen appropriately. Here, through a Lyapunov analysis, we show that the dynamics of neural network weights under GD converge to a point which is close to the minimum norm solution subject to the condition that there is no training error when using the linear approximation to the neural network.} }
Endnote
%0 Conference Paper %T The Dynamics of Gradient Descent for Overparametrized Neural Networks %A Siddhartha Satpathi %A R Srikant %B Proceedings of the 3rd Conference on Learning for Dynamics and Control %C Proceedings of Machine Learning Research %D 2021 %E Ali Jadbabaie %E John Lygeros %E George J. Pappas %E Pablo A. Parrilo %E Benjamin Recht %E Claire J. Tomlin %E Melanie N. Zeilinger %F pmlr-v144-satpathi21a %I PMLR %P 373--384 %U https://proceedings.mlr.press/v144/satpathi21a.html %V 144 %X We consider the dynamics of gradient descent (GD) in overparameterized single hidden layer neural networks with a squared loss function. Recently, it has been shown that, under some conditions, the parameter values obtained using GD achieve zero training error and generalize well if the initial conditions are chosen appropriately. Here, through a Lyapunov analysis, we show that the dynamics of neural network weights under GD converge to a point which is close to the minimum norm solution subject to the condition that there is no training error when using the linear approximation to the neural network.
APA
Satpathi, S. & Srikant, R.. (2021). The Dynamics of Gradient Descent for Overparametrized Neural Networks. Proceedings of the 3rd Conference on Learning for Dynamics and Control, in Proceedings of Machine Learning Research 144:373-384 Available from https://proceedings.mlr.press/v144/satpathi21a.html.

Related Material