Efficient Full-Matrix Adaptive Regularization

Naman Agarwal; Brian Bullins; Xinyi Chen; Elad Hazan; Karan Singh; Cyril Zhang; Yi Zhang

Efficient Full-Matrix Adaptive Regularization

Naman Agarwal, Brian Bullins, Xinyi Chen, Elad Hazan, Karan Singh, Cyril Zhang, Yi Zhang

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:102-110, 2019.

Abstract

Adaptive regularization methods pre-multiply a descent direction by a preconditioning matrix. Due to the large number of parameters of machine learning problems, full-matrix preconditioning methods are prohibitively expensive. We show how to modify full-matrix adaptive regularization in order to make it practical and effective. We also provide a novel theoretical analysis for adaptive regularization in non-convex optimization settings. The core of our algorithm, termed GGT, consists of the efficient computation of the inverse square root of a low-rank matrix. Our preliminary experiments show improved iteration-wise convergence rates across synthetic tasks and standard deep learning benchmarks, and that the more carefully-preconditioned steps sometimes lead to a better solution.

Cite this Paper

BibTeX

@InProceedings{pmlr-v97-agarwal19b,
  title = 	 {Efficient Full-Matrix Adaptive Regularization},
  author =       {Agarwal, Naman and Bullins, Brian and Chen, Xinyi and Hazan, Elad and Singh, Karan and Zhang, Cyril and Zhang, Yi},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {102--110},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/agarwal19b/agarwal19b.pdf},
  url = 	 {https://proceedings.mlr.press/v97/agarwal19b.html},
  abstract = 	 {Adaptive regularization methods pre-multiply a descent direction by a preconditioning matrix. Due to the large number of parameters of machine learning problems, full-matrix preconditioning methods are prohibitively expensive. We show how to modify full-matrix adaptive regularization in order to make it practical and effective. We also provide a novel theoretical analysis for adaptive regularization in non-convex optimization settings. The core of our algorithm, termed GGT, consists of the efficient computation of the inverse square root of a low-rank matrix. Our preliminary experiments show improved iteration-wise convergence rates across synthetic tasks and standard deep learning benchmarks, and that the more carefully-preconditioned steps sometimes lead to a better solution.}
}

Endnote

%0 Conference Paper
%T Efficient Full-Matrix Adaptive Regularization
%A Naman Agarwal
%A Brian Bullins
%A Xinyi Chen
%A Elad Hazan
%A Karan Singh
%A Cyril Zhang
%A Yi Zhang
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-agarwal19b
%I PMLR
%P 102--110
%U https://proceedings.mlr.press/v97/agarwal19b.html
%V 97
%X Adaptive regularization methods pre-multiply a descent direction by a preconditioning matrix. Due to the large number of parameters of machine learning problems, full-matrix preconditioning methods are prohibitively expensive. We show how to modify full-matrix adaptive regularization in order to make it practical and effective. We also provide a novel theoretical analysis for adaptive regularization in non-convex optimization settings. The core of our algorithm, termed GGT, consists of the efficient computation of the inverse square root of a low-rank matrix. Our preliminary experiments show improved iteration-wise convergence rates across synthetic tasks and standard deep learning benchmarks, and that the more carefully-preconditioned steps sometimes lead to a better solution.

APA

Agarwal, N., Bullins, B., Chen, X., Hazan, E., Singh, K., Zhang, C. & Zhang, Y.. (2019). Efficient Full-Matrix Adaptive Regularization. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:102-110 Available from https://proceedings.mlr.press/v97/agarwal19b.html.

Efficient Full-Matrix Adaptive Regularization

Abstract

Cite this Paper

Related Material