Fisher Information and Natural Gradient Learning in Random Deep Networks

Shun-ichi Amari; Ryo Karakida; Masafumi Oizumi

Fisher Information and Natural Gradient Learning in Random Deep Networks

Shun-ichi Amari, Ryo Karakida, Masafumi Oizumi

Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:694-702, 2019.

Abstract

The parameter space of a deep neural network is a Riemannian manifold, where the metric is defined by the Fisher information matrix. The natural gradient method uses the steepest descent direction in a Riemannian manifold, but it requires inversion of the Fisher matrix, however, which is practically difficult. The present paper uses statistical neurodynamical method to reveal the properties of the Fisher information matrix in a net of random connections. We prove that the Fisher information matrix is unit-wise block diagonal supplemented by small order terms of off-block-diagonal elements. We further prove that the Fisher information matrix of a single unit has a simple reduced form, a sum of a diagonal matrix and a rank 2 matrix of weight-bias correlations. We obtain the inverse of Fisher information explicitly. We then have an explicit form of the approximate natural gradient, without relying on the matrix inversion.

Cite this Paper

BibTeX


@InProceedings{pmlr-v89-amari19a,
  title = 	 {Fisher Information and Natural Gradient Learning in Random Deep Networks},
  author =       {Amari, Shun-ichi and Karakida, Ryo and Oizumi, Masafumi},
  booktitle = 	 {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics},
  pages = 	 {694--702},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Sugiyama, Masashi},
  volume = 	 {89},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--18 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v89/amari19a/amari19a.pdf},
  url = 	 {https://proceedings.mlr.press/v89/amari19a.html},
  abstract = 	 {The parameter space of a deep neural network is a Riemannian manifold, where the metric is defined by the Fisher information matrix.  The natural gradient method uses the steepest descent direction in a Riemannian manifold, but it requires inversion of the Fisher matrix, however, which is practically difficult.  The present paper uses statistical neurodynamical method to reveal the properties of the Fisher information matrix in a net of random connections.  We prove that the Fisher information matrix is unit-wise block diagonal supplemented by small order terms of off-block-diagonal elements.  We further prove that the Fisher information matrix of a single unit has a simple reduced form, a sum of a diagonal matrix and a rank 2 matrix of weight-bias correlations.  We obtain the inverse of Fisher information explicitly.  We then have an explicit form of the approximate natural gradient, without relying on the matrix inversion.}
}

Endnote

%0 Conference Paper
%T Fisher Information and Natural Gradient Learning in Random Deep Networks
%A Shun-ichi Amari
%A Ryo Karakida
%A Masafumi Oizumi
%B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Masashi Sugiyama	
%F pmlr-v89-amari19a
%I PMLR
%P 694--702
%U https://proceedings.mlr.press/v89/amari19a.html
%V 89
%X The parameter space of a deep neural network is a Riemannian manifold, where the metric is defined by the Fisher information matrix.  The natural gradient method uses the steepest descent direction in a Riemannian manifold, but it requires inversion of the Fisher matrix, however, which is practically difficult.  The present paper uses statistical neurodynamical method to reveal the properties of the Fisher information matrix in a net of random connections.  We prove that the Fisher information matrix is unit-wise block diagonal supplemented by small order terms of off-block-diagonal elements.  We further prove that the Fisher information matrix of a single unit has a simple reduced form, a sum of a diagonal matrix and a rank 2 matrix of weight-bias correlations.  We obtain the inverse of Fisher information explicitly.  We then have an explicit form of the approximate natural gradient, without relying on the matrix inversion.

APA


Amari, S., Karakida, R. & Oizumi, M.. (2019). Fisher Information and Natural Gradient Learning in Random Deep Networks. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:694-702 Available from https://proceedings.mlr.press/v89/amari19a.html.

Related Material

Download PDF