Fisher Information and Natural Gradient Learning in Random Deep Networks
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:694-702, 2019.
The parameter space of a deep neural network is a Riemannian manifold, where the metric is defined by the Fisher information matrix. The natural gradient method uses the steepest descent direction in a Riemannian manifold, but it requires inversion of the Fisher matrix, however, which is practically difficult. The present paper uses statistical neurodynamical method to reveal the properties of the Fisher information matrix in a net of random connections. We prove that the Fisher information matrix is unit-wise block diagonal supplemented by small order terms of off-block-diagonal elements. We further prove that the Fisher information matrix of a single unit has a simple reduced form, a sum of a diagonal matrix and a rank 2 matrix of weight-bias correlations. We obtain the inverse of Fisher information explicitly. We then have an explicit form of the approximate natural gradient, without relying on the matrix inversion.