Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox

Jialei Wang, Weiran Wang, Nathan Srebro
Proceedings of the 2017 Conference on Learning Theory, PMLR 65:1882-1919, 2017.

Abstract

We present and analyze statistically optimal, communication and memory efficient distributed stochastic optimization algorithms with near-linear speedups (up to $\log$-factors). This improves over prior work which includes methods with near-linear speedups but polynomial communication requirements (accelerated minibatch SGD) and communication efficient methods which do not exhibit any runtime speedups over a naive single-machine approach. We first analyze a distributed SVRG variant as a distributed stochastic optimization method and show that it can achieve near-linear speedups with logarithmic rounds of communication, at the cost of high memory requirements. We then present a novel method, MB-DSVRG, which trades off memory for communication and still allows for optimization with communication which scales only logarithmically with the desired accuracy while also being memory efficient. MB-DSVRG is based on a minibatch prox procedure, solving a non-linearized subproblem on a minibatch at each iteration. We provide a novel analysis for this procedure which achieves the statistical optimal rate regardless of minibatch size and smoothness, and thus significantly improving on prior work.

Cite this Paper


BibTeX
@InProceedings{pmlr-v65-wang17a, title = {Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox}, author = {Wang, Jialei and Wang, Weiran and Srebro, Nathan}, booktitle = {Proceedings of the 2017 Conference on Learning Theory}, pages = {1882--1919}, year = {2017}, editor = {Kale, Satyen and Shamir, Ohad}, volume = {65}, series = {Proceedings of Machine Learning Research}, month = {07--10 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v65/wang17a/wang17a.pdf}, url = {https://proceedings.mlr.press/v65/wang17a.html}, abstract = {We present and analyze statistically optimal, communication and memory efficient distributed stochastic optimization algorithms with near-linear speedups (up to $\log$-factors). This improves over prior work which includes methods with near-linear speedups but polynomial communication requirements (accelerated minibatch SGD) and communication efficient methods which do not exhibit any runtime speedups over a naive single-machine approach. We first analyze a distributed SVRG variant as a distributed stochastic optimization method and show that it can achieve near-linear speedups with logarithmic rounds of communication, at the cost of high memory requirements. We then present a novel method, MB-DSVRG, which trades off memory for communication and still allows for optimization with communication which scales only logarithmically with the desired accuracy while also being memory efficient. MB-DSVRG is based on a minibatch prox procedure, solving a non-linearized subproblem on a minibatch at each iteration. We provide a novel analysis for this procedure which achieves the statistical optimal rate regardless of minibatch size and smoothness, and thus significantly improving on prior work.} }
Endnote
%0 Conference Paper %T Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox %A Jialei Wang %A Weiran Wang %A Nathan Srebro %B Proceedings of the 2017 Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2017 %E Satyen Kale %E Ohad Shamir %F pmlr-v65-wang17a %I PMLR %P 1882--1919 %U https://proceedings.mlr.press/v65/wang17a.html %V 65 %X We present and analyze statistically optimal, communication and memory efficient distributed stochastic optimization algorithms with near-linear speedups (up to $\log$-factors). This improves over prior work which includes methods with near-linear speedups but polynomial communication requirements (accelerated minibatch SGD) and communication efficient methods which do not exhibit any runtime speedups over a naive single-machine approach. We first analyze a distributed SVRG variant as a distributed stochastic optimization method and show that it can achieve near-linear speedups with logarithmic rounds of communication, at the cost of high memory requirements. We then present a novel method, MB-DSVRG, which trades off memory for communication and still allows for optimization with communication which scales only logarithmically with the desired accuracy while also being memory efficient. MB-DSVRG is based on a minibatch prox procedure, solving a non-linearized subproblem on a minibatch at each iteration. We provide a novel analysis for this procedure which achieves the statistical optimal rate regardless of minibatch size and smoothness, and thus significantly improving on prior work.
APA
Wang, J., Wang, W. & Srebro, N.. (2017). Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox. Proceedings of the 2017 Conference on Learning Theory, in Proceedings of Machine Learning Research 65:1882-1919 Available from https://proceedings.mlr.press/v65/wang17a.html.

Related Material