Measurements of ThreeLevel Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians
[edit]
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:50125021, 2019.
Abstract
We expose a structure in deep classifying neural networks in the derivative of the logits with respect to the parameters of the model, which is used to explain the existence of outliers in the spectrum of the Hessian. Previous works decomposed the Hessian into two components, attributing the outliers to one of them, the socalled Covariance of gradients. We show this term is not a Covariance but a second moment matrix, i.e., it is influenced by means of gradients. These means possess an additive twoway structure that is the source of the outliers in the spectrum. This structure can be used to approximate the principal subspace of the Hessian using certain "averaging" operations, avoiding the need for highdimensional eigenanalysis. We corroborate this claim across different datasets, architectures and sample sizes.
Related Material


