Shampoo: Preconditioned Stochastic Tensor Optimization

Vineet Gupta, Tomer Koren, Yoram Singer
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1842-1850, 2018.

Abstract

Preconditioned gradient methods are among the most general and powerful tools in optimization. However, preconditioning requires storing and manipulating prohibitively large matrices. We describe and analyze a new structure-aware preconditioning algorithm, called Shampoo, for stochastic optimization over tensor spaces. Shampoo maintains a set of preconditioning matrices, each of which operates on a single dimension, contracting over the remaining dimensions. We establish convergence guarantees in the stochastic convex setting, the proof of which builds upon matrix trace inequalities. Our experiments with state-of-the-art deep learning models show that Shampoo is capable of converging considerably faster than commonly used optimizers. Surprisingly, although it involves a more complex update rule, Shampoo’s runtime per step is comparable in practice to that of simple gradient methods such as SGD, AdaGrad, and Adam.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-gupta18a, title = {Shampoo: Preconditioned Stochastic Tensor Optimization}, author = {Gupta, Vineet and Koren, Tomer and Singer, Yoram}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {1842--1850}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/gupta18a/gupta18a.pdf}, url = {https://proceedings.mlr.press/v80/gupta18a.html}, abstract = {Preconditioned gradient methods are among the most general and powerful tools in optimization. However, preconditioning requires storing and manipulating prohibitively large matrices. We describe and analyze a new structure-aware preconditioning algorithm, called Shampoo, for stochastic optimization over tensor spaces. Shampoo maintains a set of preconditioning matrices, each of which operates on a single dimension, contracting over the remaining dimensions. We establish convergence guarantees in the stochastic convex setting, the proof of which builds upon matrix trace inequalities. Our experiments with state-of-the-art deep learning models show that Shampoo is capable of converging considerably faster than commonly used optimizers. Surprisingly, although it involves a more complex update rule, Shampoo’s runtime per step is comparable in practice to that of simple gradient methods such as SGD, AdaGrad, and Adam.} }
Endnote
%0 Conference Paper %T Shampoo: Preconditioned Stochastic Tensor Optimization %A Vineet Gupta %A Tomer Koren %A Yoram Singer %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-gupta18a %I PMLR %P 1842--1850 %U https://proceedings.mlr.press/v80/gupta18a.html %V 80 %X Preconditioned gradient methods are among the most general and powerful tools in optimization. However, preconditioning requires storing and manipulating prohibitively large matrices. We describe and analyze a new structure-aware preconditioning algorithm, called Shampoo, for stochastic optimization over tensor spaces. Shampoo maintains a set of preconditioning matrices, each of which operates on a single dimension, contracting over the remaining dimensions. We establish convergence guarantees in the stochastic convex setting, the proof of which builds upon matrix trace inequalities. Our experiments with state-of-the-art deep learning models show that Shampoo is capable of converging considerably faster than commonly used optimizers. Surprisingly, although it involves a more complex update rule, Shampoo’s runtime per step is comparable in practice to that of simple gradient methods such as SGD, AdaGrad, and Adam.
APA
Gupta, V., Koren, T. & Singer, Y.. (2018). Shampoo: Preconditioned Stochastic Tensor Optimization. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:1842-1850 Available from https://proceedings.mlr.press/v80/gupta18a.html.

Related Material