Asynchronous Distributed Optimization with Stochastic Delays

Margalit R. Glasgow; Mary Wootters

Asynchronous Distributed Optimization with Stochastic Delays

Margalit R. Glasgow, Mary Wootters

Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:9247-9279, 2022.

Abstract

We study asynchronous finite sum minimization in a distributed-data setting with a central parameter server. While asynchrony is well understood in parallel settings where the data is accessible by all machines—e.g., modifications of variance-reduced gradient algorithms like SAGA work well—little is known for the distributed-data setting. We develop an algorithm ADSAGA based on SAGA for the distributed-data setting, in which the data is partitioned between many machines. We show that with

$m$ machines, under a natural stochastic delay model with an mean delay of

$m$ , ADSAGA converges in

$\tilde{O}\left(\left(n + \sqrt{m}\kappa\right)\log(1/\epsilon)\right)$ iterations, where

$n$ is the number of component functions, and

$\kappa$ is a condition number. This complexity sits squarely between the complexity

$\tilde{O}\left(\left(n + \kappa\right)\log(1/\epsilon)\right)$ of SAGA without delays and the complexity

$\tilde{O}\left(\left(n + m\kappa\right)\log(1/\epsilon)\right)$ of parallel asynchronous algorithms where the delays are arbitrary (but bounded by

$O(m)$ ), and the data is accessible by all. Existing asynchronous algorithms with distributed-data setting and arbitrary delays have only been shown to converge in

$\tilde{O}(n^2\kappa\log(1/\epsilon))$ iterations. We empirically compare on least-squares problems the iteration complexity and wallclock performance of ADSAGA to existing parallel and distributed algorithms, including synchronous minibatch algorithms. Our results demonstrate the wallclock advantage of variance-reduced asynchronous approaches over SGD or synchronous approaches.

Cite this Paper

BibTeX


@InProceedings{pmlr-v151-glasgow22b,
  title = 	 { Asynchronous Distributed Optimization with Stochastic Delays },
  author =       {Glasgow, Margalit R. and Wootters, Mary},
  booktitle = 	 {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {9247--9279},
  year = 	 {2022},
  editor = 	 {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel},
  volume = 	 {151},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {28--30 Mar},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v151/glasgow22b/glasgow22b.pdf},
  url = 	 {https://proceedings.mlr.press/v151/glasgow22b.html},
  abstract = 	 { We study asynchronous finite sum minimization in a distributed-data setting with a central parameter server. While asynchrony is well understood in parallel settings where the data is accessible by all machines—e.g., modifications of variance-reduced gradient algorithms like SAGA work well—little is known for the distributed-data setting. We develop an algorithm ADSAGA based on SAGA for the distributed-data setting, in which the data is partitioned between many machines. We show that with $m$ machines, under a natural stochastic delay model with an mean delay of $m$, ADSAGA converges in $\tilde{O}\left(\left(n + \sqrt{m}\kappa\right)\log(1/\epsilon)\right)$ iterations, where $n$ is the number of component functions, and $\kappa$ is a condition number. This complexity sits squarely between the complexity $\tilde{O}\left(\left(n + \kappa\right)\log(1/\epsilon)\right)$ of SAGA without delays and the complexity $\tilde{O}\left(\left(n + m\kappa\right)\log(1/\epsilon)\right)$ of parallel asynchronous algorithms where the delays are arbitrary (but bounded by $O(m)$), and the data is accessible by all. Existing asynchronous algorithms with distributed-data setting and arbitrary delays have only been shown to converge in $\tilde{O}(n^2\kappa\log(1/\epsilon))$ iterations. We empirically compare on least-squares problems the iteration complexity and wallclock performance of ADSAGA to existing parallel and distributed algorithms, including synchronous minibatch algorithms. Our results demonstrate the wallclock advantage of variance-reduced asynchronous approaches over SGD or synchronous approaches. }
}

Endnote

%0 Conference Paper
%T  Asynchronous Distributed Optimization with Stochastic Delays 
%A Margalit R. Glasgow
%A Mary Wootters
%B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2022
%E Gustau Camps-Valls
%E Francisco J. R. Ruiz
%E Isabel Valera	
%F pmlr-v151-glasgow22b
%I PMLR
%P 9247--9279
%U https://proceedings.mlr.press/v151/glasgow22b.html
%V 151
%X  We study asynchronous finite sum minimization in a distributed-data setting with a central parameter server. While asynchrony is well understood in parallel settings where the data is accessible by all machines—e.g., modifications of variance-reduced gradient algorithms like SAGA work well—little is known for the distributed-data setting. We develop an algorithm ADSAGA based on SAGA for the distributed-data setting, in which the data is partitioned between many machines. We show that with $m$ machines, under a natural stochastic delay model with an mean delay of $m$, ADSAGA converges in $\tilde{O}\left(\left(n + \sqrt{m}\kappa\right)\log(1/\epsilon)\right)$ iterations, where $n$ is the number of component functions, and $\kappa$ is a condition number. This complexity sits squarely between the complexity $\tilde{O}\left(\left(n + \kappa\right)\log(1/\epsilon)\right)$ of SAGA without delays and the complexity $\tilde{O}\left(\left(n + m\kappa\right)\log(1/\epsilon)\right)$ of parallel asynchronous algorithms where the delays are arbitrary (but bounded by $O(m)$), and the data is accessible by all. Existing asynchronous algorithms with distributed-data setting and arbitrary delays have only been shown to converge in $\tilde{O}(n^2\kappa\log(1/\epsilon))$ iterations. We empirically compare on least-squares problems the iteration complexity and wallclock performance of ADSAGA to existing parallel and distributed algorithms, including synchronous minibatch algorithms. Our results demonstrate the wallclock advantage of variance-reduced asynchronous approaches over SGD or synchronous approaches.

APA


Glasgow, M.R. & Wootters, M.. (2022).  Asynchronous Distributed Optimization with Stochastic Delays . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:9247-9279 Available from https://proceedings.mlr.press/v151/glasgow22b.html.

Asynchronous Distributed Optimization with Stochastic Delays

Abstract

Cite this Paper

Related Material