Fisher Information and Natural Gradient Learning in Random Deep Networks
[edit]
Proceedings of Machine Learning Research, PMLR 89:694702, 2019.
Abstract
The parameter space of a deep neural network is a Riemannian manifold, where the metric is defined by the Fisher information matrix. The natural gradient method uses the steepest descent direction in a Riemannian manifold, but it requires inversion of the Fisher matrix, however, which is practically difficult. The present paper uses statistical neurodynamical method to reveal the properties of the Fisher information matrix in a net of random connections. We prove that the Fisher information matrix is unitwise block diagonal supplemented by small order terms of offblockdiagonal elements. We further prove that the Fisher information matrix of a single unit has a simple reduced form, a sum of a diagonal matrix and a rank 2 matrix of weightbias correlations. We obtain the inverse of Fisher information explicitly. We then have an explicit form of the approximate natural gradient, without relying on the matrix inversion.
Related Material


