Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox

Jialei Wang; Weiran Wang; Nathan Srebro

Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox

Jialei Wang, Weiran Wang, Nathan Srebro

Proceedings of the 2017 Conference on Learning Theory, PMLR 65:1882-1919, 2017.

Abstract

We present and analyze statistically optimal, communication and memory efficient distributed stochastic optimization algorithms with near-linear speedups (up to $\log$-factors). This improves over prior work which includes methods with near-linear speedups but polynomial communication requirements (accelerated minibatch SGD) and communication efficient methods which do not exhibit any runtime speedups over a naive single-machine approach. We first analyze a distributed SVRG variant as a distributed stochastic optimization method and show that it can achieve near-linear speedups with logarithmic rounds of communication, at the cost of high memory requirements. We then present a novel method, MB-DSVRG, which trades off memory for communication and still allows for optimization with communication which scales only logarithmically with the desired accuracy while also being memory efficient. MB-DSVRG is based on a minibatch prox procedure, solving a non-linearized subproblem on a minibatch at each iteration. We provide a novel analysis for this procedure which achieves the statistical optimal rate regardless of minibatch size and smoothness, and thus significantly improving on prior work.

Cite this Paper

BibTeX


@InProceedings{pmlr-v65-wang17a,
  title = 	 {Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox},
  author = 	 {Wang, Jialei and Wang, Weiran and Srebro, Nathan},
  booktitle = 	 {Proceedings of the 2017 Conference on Learning Theory},
  pages = 	 {1882--1919},
  year = 	 {2017},
  editor = 	 {Kale, Satyen and Shamir, Ohad},
  volume = 	 {65},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {07--10 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v65/wang17a/wang17a.pdf},
  url = 	 {https://proceedings.mlr.press/v65/wang17a.html},
  abstract = 	 {We present and analyze statistically optimal, communication and memory efficient distributed stochastic optimization algorithms with near-linear speedups (up to $\log$-factors).  This improves over prior work which includes methods with near-linear speedups but polynomial communication requirements (accelerated minibatch SGD) and communication efficient methods which do not exhibit any runtime speedups over a naive single-machine approach.  We first analyze a distributed SVRG variant as a distributed stochastic optimization method and show that it can achieve near-linear speedups with logarithmic rounds of communication, at the cost of high memory requirements. We then present a novel method, MB-DSVRG, which trades off memory for communication and still allows for optimization with communication which scales only logarithmically with the desired accuracy while also being memory efficient.  MB-DSVRG is based on a minibatch prox procedure, solving a non-linearized subproblem on a minibatch at each iteration.  We provide a novel analysis for this procedure which achieves the statistical optimal rate regardless of minibatch size and smoothness, and thus significantly improving on prior work.}
}

Endnote

%0 Conference Paper
%T Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox
%A Jialei Wang
%A Weiran Wang
%A Nathan Srebro
%B Proceedings of the 2017 Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2017
%E Satyen Kale
%E Ohad Shamir	
%F pmlr-v65-wang17a
%I PMLR
%P 1882--1919
%U https://proceedings.mlr.press/v65/wang17a.html
%V 65
%X We present and analyze statistically optimal, communication and memory efficient distributed stochastic optimization algorithms with near-linear speedups (up to $\log$-factors).  This improves over prior work which includes methods with near-linear speedups but polynomial communication requirements (accelerated minibatch SGD) and communication efficient methods which do not exhibit any runtime speedups over a naive single-machine approach.  We first analyze a distributed SVRG variant as a distributed stochastic optimization method and show that it can achieve near-linear speedups with logarithmic rounds of communication, at the cost of high memory requirements. We then present a novel method, MB-DSVRG, which trades off memory for communication and still allows for optimization with communication which scales only logarithmically with the desired accuracy while also being memory efficient.  MB-DSVRG is based on a minibatch prox procedure, solving a non-linearized subproblem on a minibatch at each iteration.  We provide a novel analysis for this procedure which achieves the statistical optimal rate regardless of minibatch size and smoothness, and thus significantly improving on prior work.

APA


Wang, J., Wang, W. & Srebro, N.. (2017). Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox. Proceedings of the 2017 Conference on Learning Theory, in Proceedings of Machine Learning Research 65:1882-1919 Available from https://proceedings.mlr.press/v65/wang17a.html.

Related Material

Download PDF