Efficient Full-Matrix Adaptive Regularization

Naman Agarwal, Brian Bullins, Xinyi Chen, Elad Hazan, Karan Singh, Cyril Zhang, Yi Zhang
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:102-110, 2019.

Abstract

Adaptive regularization methods pre-multiply a descent direction by a preconditioning matrix. Due to the large number of parameters of machine learning problems, full-matrix preconditioning methods are prohibitively expensive. We show how to modify full-matrix adaptive regularization in order to make it practical and effective. We also provide a novel theoretical analysis for adaptive regularization in non-convex optimization settings. The core of our algorithm, termed GGT, consists of the efficient computation of the inverse square root of a low-rank matrix. Our preliminary experiments show improved iteration-wise convergence rates across synthetic tasks and standard deep learning benchmarks, and that the more carefully-preconditioned steps sometimes lead to a better solution.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-agarwal19b, title = {Efficient Full-Matrix Adaptive Regularization}, author = {Agarwal, Naman and Bullins, Brian and Chen, Xinyi and Hazan, Elad and Singh, Karan and Zhang, Cyril and Zhang, Yi}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {102--110}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/agarwal19b/agarwal19b.pdf}, url = {https://proceedings.mlr.press/v97/agarwal19b.html}, abstract = {Adaptive regularization methods pre-multiply a descent direction by a preconditioning matrix. Due to the large number of parameters of machine learning problems, full-matrix preconditioning methods are prohibitively expensive. We show how to modify full-matrix adaptive regularization in order to make it practical and effective. We also provide a novel theoretical analysis for adaptive regularization in non-convex optimization settings. The core of our algorithm, termed GGT, consists of the efficient computation of the inverse square root of a low-rank matrix. Our preliminary experiments show improved iteration-wise convergence rates across synthetic tasks and standard deep learning benchmarks, and that the more carefully-preconditioned steps sometimes lead to a better solution.} }
Endnote
%0 Conference Paper %T Efficient Full-Matrix Adaptive Regularization %A Naman Agarwal %A Brian Bullins %A Xinyi Chen %A Elad Hazan %A Karan Singh %A Cyril Zhang %A Yi Zhang %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-agarwal19b %I PMLR %P 102--110 %U https://proceedings.mlr.press/v97/agarwal19b.html %V 97 %X Adaptive regularization methods pre-multiply a descent direction by a preconditioning matrix. Due to the large number of parameters of machine learning problems, full-matrix preconditioning methods are prohibitively expensive. We show how to modify full-matrix adaptive regularization in order to make it practical and effective. We also provide a novel theoretical analysis for adaptive regularization in non-convex optimization settings. The core of our algorithm, termed GGT, consists of the efficient computation of the inverse square root of a low-rank matrix. Our preliminary experiments show improved iteration-wise convergence rates across synthetic tasks and standard deep learning benchmarks, and that the more carefully-preconditioned steps sometimes lead to a better solution.
APA
Agarwal, N., Bullins, B., Chen, X., Hazan, E., Singh, K., Zhang, C. & Zhang, Y.. (2019). Efficient Full-Matrix Adaptive Regularization. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:102-110 Available from https://proceedings.mlr.press/v97/agarwal19b.html.

Related Material