Geometry of Neural Network Loss Surfaces via Random Matrix Theory

Jeffrey Pennington, Yasaman Bahri
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:2798-2806, 2017.

Abstract

Understanding the geometry of neural network loss surfaces is important for the development of improved optimization algorithms and for building a theoretical understanding of why deep learning works. In this paper, we study the geometry in terms of the distribution of eigenvalues of the Hessian matrix at critical points of varying energy. We introduce an analytical framework and a set of tools from random matrix theory that allow us to compute an approximation of this distribution under a set of simplifying assumptions. The shape of the spectrum depends strongly on the energy and another key parameter, $\phi$, which measures the ratio of parameters to data points. Our analysis predicts and numerical simulations support that for critical points of small index, the number of negative eigenvalues scales like the 3/2 power of the energy. We leave as an open problem an explanation for our observation that, in the context of a certain memorization task, the energy of minimizers is well-approximated by the function $1/2(1-\phi)^2$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-pennington17a, title = {Geometry of Neural Network Loss Surfaces via Random Matrix Theory}, author = {Jeffrey Pennington and Yasaman Bahri}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {2798--2806}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/pennington17a/pennington17a.pdf}, url = {https://proceedings.mlr.press/v70/pennington17a.html}, abstract = {Understanding the geometry of neural network loss surfaces is important for the development of improved optimization algorithms and for building a theoretical understanding of why deep learning works. In this paper, we study the geometry in terms of the distribution of eigenvalues of the Hessian matrix at critical points of varying energy. We introduce an analytical framework and a set of tools from random matrix theory that allow us to compute an approximation of this distribution under a set of simplifying assumptions. The shape of the spectrum depends strongly on the energy and another key parameter, $\phi$, which measures the ratio of parameters to data points. Our analysis predicts and numerical simulations support that for critical points of small index, the number of negative eigenvalues scales like the 3/2 power of the energy. We leave as an open problem an explanation for our observation that, in the context of a certain memorization task, the energy of minimizers is well-approximated by the function $1/2(1-\phi)^2$.} }
Endnote
%0 Conference Paper %T Geometry of Neural Network Loss Surfaces via Random Matrix Theory %A Jeffrey Pennington %A Yasaman Bahri %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-pennington17a %I PMLR %P 2798--2806 %U https://proceedings.mlr.press/v70/pennington17a.html %V 70 %X Understanding the geometry of neural network loss surfaces is important for the development of improved optimization algorithms and for building a theoretical understanding of why deep learning works. In this paper, we study the geometry in terms of the distribution of eigenvalues of the Hessian matrix at critical points of varying energy. We introduce an analytical framework and a set of tools from random matrix theory that allow us to compute an approximation of this distribution under a set of simplifying assumptions. The shape of the spectrum depends strongly on the energy and another key parameter, $\phi$, which measures the ratio of parameters to data points. Our analysis predicts and numerical simulations support that for critical points of small index, the number of negative eigenvalues scales like the 3/2 power of the energy. We leave as an open problem an explanation for our observation that, in the context of a certain memorization task, the energy of minimizers is well-approximated by the function $1/2(1-\phi)^2$.
APA
Pennington, J. & Bahri, Y.. (2017). Geometry of Neural Network Loss Surfaces via Random Matrix Theory. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:2798-2806 Available from https://proceedings.mlr.press/v70/pennington17a.html.

Related Material