Accelerating Natural Gradient with Higher-Order Invariance

Yang Song; Jiaming Song; Stefano Ermon

Accelerating Natural Gradient with Higher-Order Invariance

Yang Song, Jiaming Song, Stefano Ermon

Proceedings of the 35th International Conference on Machine Learning, PMLR 80:4713-4722, 2018.

Abstract

An appealing property of the natural gradient is that it is invariant to arbitrary differentiable reparameterizations of the model. However, this invariance property requires infinitesimal steps and is lost in practical implementations with small but finite step sizes. In this paper, we study invariance properties from a combined perspective of Riemannian geometry and numerical differential equation solving. We define the order of invariance of a numerical method to be its convergence order to an invariant solution. We propose to use higher-order integrators and geodesic corrections to obtain more invariant optimization trajectories. We prove the numerical convergence properties of geodesic corrected updates and show that they can be as computational efficient as plain natural gradient. Experimentally, we demonstrate that invariance leads to faster optimization and our techniques improve on traditional natural gradient in deep neural network training and natural policy gradient for reinforcement learning.

Cite this Paper

BibTeX

@InProceedings{pmlr-v80-song18a,
  title = 	 {Accelerating Natural Gradient with Higher-Order Invariance},
  author =       {Song, Yang and Song, Jiaming and Ermon, Stefano},
  booktitle = 	 {Proceedings of the 35th International Conference on Machine Learning},
  pages = 	 {4713--4722},
  year = 	 {2018},
  editor = 	 {Dy, Jennifer and Krause, Andreas},
  volume = 	 {80},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v80/song18a/song18a.pdf},
  url = 	 {https://proceedings.mlr.press/v80/song18a.html},
  abstract = 	 {An appealing property of the natural gradient is that it is invariant to arbitrary differentiable reparameterizations of the model. However, this invariance property requires infinitesimal steps and is lost in practical implementations with small but finite step sizes. In this paper, we study invariance properties from a combined perspective of Riemannian geometry and numerical differential equation solving. We define the order of invariance of a numerical method to be its convergence order to an invariant solution. We propose to use higher-order integrators and geodesic corrections to obtain more invariant optimization trajectories. We prove the numerical convergence properties of geodesic corrected updates and show that they can be as computational efficient as plain natural gradient. Experimentally, we demonstrate that invariance leads to faster optimization and our techniques improve on traditional natural gradient in deep neural network training and natural policy gradient for reinforcement learning.}
}

Endnote

%0 Conference Paper
%T Accelerating Natural Gradient with Higher-Order Invariance
%A Yang Song
%A Jiaming Song
%A Stefano Ermon
%B Proceedings of the 35th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Jennifer Dy
%E Andreas Krause	
%F pmlr-v80-song18a
%I PMLR
%P 4713--4722
%U https://proceedings.mlr.press/v80/song18a.html
%V 80
%X An appealing property of the natural gradient is that it is invariant to arbitrary differentiable reparameterizations of the model. However, this invariance property requires infinitesimal steps and is lost in practical implementations with small but finite step sizes. In this paper, we study invariance properties from a combined perspective of Riemannian geometry and numerical differential equation solving. We define the order of invariance of a numerical method to be its convergence order to an invariant solution. We propose to use higher-order integrators and geodesic corrections to obtain more invariant optimization trajectories. We prove the numerical convergence properties of geodesic corrected updates and show that they can be as computational efficient as plain natural gradient. Experimentally, we demonstrate that invariance leads to faster optimization and our techniques improve on traditional natural gradient in deep neural network training and natural policy gradient for reinforcement learning.

APA

Song, Y., Song, J. & Ermon, S.. (2018). Accelerating Natural Gradient with Higher-Order Invariance. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:4713-4722 Available from https://proceedings.mlr.press/v80/song18a.html.

Accelerating Natural Gradient with Higher-Order Invariance

Abstract

Cite this Paper

Related Material