An Investigation into Neural Net Optimization via Hessian Eigenvalue Density

Behrooz Ghorbani, Shankar Krishnan, Ying Xiao
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:2232-2241, 2019.

Abstract

To understand the dynamics of training in deep neural networks, we study the evolution of the Hessian eigenvalue density throughout the optimization process. In non-batch normalized networks, we observe the rapid appearance of large isolated eigenvalues in the spectrum, along with a surprising concentration of the gradient in the corresponding eigenspaces. In a batch normalized network, these two effects are almost absent. We give a theoretical rationale to partially explain these phenomena. As part of this work, we adapt advanced tools from numerical linear algebra that allow scalable and accurate estimation of the entire Hessian spectrum of ImageNet-scale neural networks; this technique may be of independent interest in other applications.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-ghorbani19b, title = {An Investigation into Neural Net Optimization via Hessian Eigenvalue Density}, author = {Ghorbani, Behrooz and Krishnan, Shankar and Xiao, Ying}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {2232--2241}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/ghorbani19b/ghorbani19b.pdf}, url = {https://proceedings.mlr.press/v97/ghorbani19b.html}, abstract = {To understand the dynamics of training in deep neural networks, we study the evolution of the Hessian eigenvalue density throughout the optimization process. In non-batch normalized networks, we observe the rapid appearance of large isolated eigenvalues in the spectrum, along with a surprising concentration of the gradient in the corresponding eigenspaces. In a batch normalized network, these two effects are almost absent. We give a theoretical rationale to partially explain these phenomena. As part of this work, we adapt advanced tools from numerical linear algebra that allow scalable and accurate estimation of the entire Hessian spectrum of ImageNet-scale neural networks; this technique may be of independent interest in other applications.} }
Endnote
%0 Conference Paper %T An Investigation into Neural Net Optimization via Hessian Eigenvalue Density %A Behrooz Ghorbani %A Shankar Krishnan %A Ying Xiao %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-ghorbani19b %I PMLR %P 2232--2241 %U https://proceedings.mlr.press/v97/ghorbani19b.html %V 97 %X To understand the dynamics of training in deep neural networks, we study the evolution of the Hessian eigenvalue density throughout the optimization process. In non-batch normalized networks, we observe the rapid appearance of large isolated eigenvalues in the spectrum, along with a surprising concentration of the gradient in the corresponding eigenspaces. In a batch normalized network, these two effects are almost absent. We give a theoretical rationale to partially explain these phenomena. As part of this work, we adapt advanced tools from numerical linear algebra that allow scalable and accurate estimation of the entire Hessian spectrum of ImageNet-scale neural networks; this technique may be of independent interest in other applications.
APA
Ghorbani, B., Krishnan, S. & Xiao, Y.. (2019). An Investigation into Neural Net Optimization via Hessian Eigenvalue Density. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:2232-2241 Available from https://proceedings.mlr.press/v97/ghorbani19b.html.

Related Material