Gradient Descent Finds Global Minima of Deep Neural Networks

Simon Du, Jason Lee, Haochuan Li, Liwei Wang, Xiyu Zhai
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:1675-1685, 2019.

Abstract

Gradient descent finds a global minimum in training deep neural networks despite the objective function being non-convex. The current paper proves gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet). Our analysis relies on the particular structure of the Gram matrix induced by the neural network architecture. This structure allows us to show the Gram matrix is stable throughout the training process and this stability implies the global optimality of the gradient descent algorithm. We further extend our analysis to deep residual convolutional neural networks and obtain a similar convergence result.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-du19c, title = {Gradient Descent Finds Global Minima of Deep Neural Networks}, author = {Du, Simon and Lee, Jason and Li, Haochuan and Wang, Liwei and Zhai, Xiyu}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {1675--1685}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/du19c/du19c.pdf}, url = {https://proceedings.mlr.press/v97/du19c.html}, abstract = {Gradient descent finds a global minimum in training deep neural networks despite the objective function being non-convex. The current paper proves gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet). Our analysis relies on the particular structure of the Gram matrix induced by the neural network architecture. This structure allows us to show the Gram matrix is stable throughout the training process and this stability implies the global optimality of the gradient descent algorithm. We further extend our analysis to deep residual convolutional neural networks and obtain a similar convergence result.} }
Endnote
%0 Conference Paper %T Gradient Descent Finds Global Minima of Deep Neural Networks %A Simon Du %A Jason Lee %A Haochuan Li %A Liwei Wang %A Xiyu Zhai %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-du19c %I PMLR %P 1675--1685 %U https://proceedings.mlr.press/v97/du19c.html %V 97 %X Gradient descent finds a global minimum in training deep neural networks despite the objective function being non-convex. The current paper proves gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet). Our analysis relies on the particular structure of the Gram matrix induced by the neural network architecture. This structure allows us to show the Gram matrix is stable throughout the training process and this stability implies the global optimality of the gradient descent algorithm. We further extend our analysis to deep residual convolutional neural networks and obtain a similar convergence result.
APA
Du, S., Lee, J., Li, H., Wang, L. & Zhai, X.. (2019). Gradient Descent Finds Global Minima of Deep Neural Networks. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:1675-1685 Available from https://proceedings.mlr.press/v97/du19c.html.

Related Material