Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians

Vardan Papyan
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:5012-5021, 2019.

Abstract

We expose a structure in deep classifying neural networks in the derivative of the logits with respect to the parameters of the model, which is used to explain the existence of outliers in the spectrum of the Hessian. Previous works decomposed the Hessian into two components, attributing the outliers to one of them, the so-called Covariance of gradients. We show this term is not a Covariance but a second moment matrix, i.e., it is influenced by means of gradients. These means possess an additive two-way structure that is the source of the outliers in the spectrum. This structure can be used to approximate the principal subspace of the Hessian using certain "averaging" operations, avoiding the need for high-dimensional eigenanalysis. We corroborate this claim across different datasets, architectures and sample sizes.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-papyan19a, title = {Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians}, author = {Papyan, Vardan}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {5012--5021}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/papyan19a/papyan19a.pdf}, url = {https://proceedings.mlr.press/v97/papyan19a.html}, abstract = {We expose a structure in deep classifying neural networks in the derivative of the logits with respect to the parameters of the model, which is used to explain the existence of outliers in the spectrum of the Hessian. Previous works decomposed the Hessian into two components, attributing the outliers to one of them, the so-called Covariance of gradients. We show this term is not a Covariance but a second moment matrix, i.e., it is influenced by means of gradients. These means possess an additive two-way structure that is the source of the outliers in the spectrum. This structure can be used to approximate the principal subspace of the Hessian using certain "averaging" operations, avoiding the need for high-dimensional eigenanalysis. We corroborate this claim across different datasets, architectures and sample sizes.} }
Endnote
%0 Conference Paper %T Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians %A Vardan Papyan %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-papyan19a %I PMLR %P 5012--5021 %U https://proceedings.mlr.press/v97/papyan19a.html %V 97 %X We expose a structure in deep classifying neural networks in the derivative of the logits with respect to the parameters of the model, which is used to explain the existence of outliers in the spectrum of the Hessian. Previous works decomposed the Hessian into two components, attributing the outliers to one of them, the so-called Covariance of gradients. We show this term is not a Covariance but a second moment matrix, i.e., it is influenced by means of gradients. These means possess an additive two-way structure that is the source of the outliers in the spectrum. This structure can be used to approximate the principal subspace of the Hessian using certain "averaging" operations, avoiding the need for high-dimensional eigenanalysis. We corroborate this claim across different datasets, architectures and sample sizes.
APA
Papyan, V.. (2019). Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:5012-5021 Available from https://proceedings.mlr.press/v97/papyan19a.html.

Related Material