AdaDelay: Delay Adaptive Distributed Stochastic Optimization

Suvrit Sra; Adams Wei Yu; Mu Li; Alex Smola

AdaDelay: Delay Adaptive Distributed Stochastic Optimization

Suvrit Sra, Adams Wei Yu, Mu Li, Alex Smola

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:957-965, 2016.

Abstract

We develop distributed stochastic convex optimization algorithms under a delayed gradient model in which server nodes update parameters and worker nodes compute stochastic (sub)gradients. Our setup is motivated by the behavior of real-world distributed computation systems; in particular, we analyze a setting wherein worker nodes can be differently slow at different times. In contrast to existing approaches, we do not impose a worst-case bound on the delays experienced but rather allow the updates to be sensitive to the actual delays experienced. This sensitivity allows use of larger stepsizes, which can help speed up initial convergence without having to wait too long for slower machines; the global convergence rate is still preserved. We experiment with different delay patterns, and obtain noticeable improvements for large-scale real datasets with billions of examples and features.

Cite this Paper

BibTeX


@InProceedings{pmlr-v51-sra16,
  title = 	 {AdaDelay: Delay Adaptive Distributed Stochastic Optimization},
  author = 	 {Sra, Suvrit and Yu, Adams Wei and Li, Mu and Smola, Alex},
  booktitle = 	 {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {957--965},
  year = 	 {2016},
  editor = 	 {Gretton, Arthur and Robert, Christian C.},
  volume = 	 {51},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Cadiz, Spain},
  month = 	 {09--11 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v51/sra16.pdf},
  url = 	 {https://proceedings.mlr.press/v51/sra16.html},
  abstract = 	 {We develop distributed stochastic convex optimization algorithms under a delayed gradient model in which server nodes update parameters and worker nodes compute stochastic (sub)gradients. Our setup is motivated by the behavior of real-world distributed computation systems; in particular, we analyze a setting wherein worker nodes can be differently slow at different times. In contrast to existing approaches, we do not impose a worst-case bound on the delays experienced but rather allow the updates to be sensitive to the actual delays experienced. This sensitivity allows use of larger stepsizes, which can help speed up initial convergence without having to wait too long for slower machines; the global convergence rate is still preserved. We experiment with different delay patterns, and obtain noticeable  improvements for large-scale real datasets with billions of examples and features.}
}

Endnote

%0 Conference Paper
%T AdaDelay: Delay Adaptive Distributed Stochastic Optimization
%A Suvrit Sra
%A Adams Wei Yu
%A Mu Li
%A Alex Smola
%B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2016
%E Arthur Gretton
%E Christian C. Robert	
%F pmlr-v51-sra16
%I PMLR
%P 957--965
%U https://proceedings.mlr.press/v51/sra16.html
%V 51
%X We develop distributed stochastic convex optimization algorithms under a delayed gradient model in which server nodes update parameters and worker nodes compute stochastic (sub)gradients. Our setup is motivated by the behavior of real-world distributed computation systems; in particular, we analyze a setting wherein worker nodes can be differently slow at different times. In contrast to existing approaches, we do not impose a worst-case bound on the delays experienced but rather allow the updates to be sensitive to the actual delays experienced. This sensitivity allows use of larger stepsizes, which can help speed up initial convergence without having to wait too long for slower machines; the global convergence rate is still preserved. We experiment with different delay patterns, and obtain noticeable  improvements for large-scale real datasets with billions of examples and features.

RIS


TY  - CPAPER
TI  - AdaDelay: Delay Adaptive Distributed Stochastic Optimization
AU  - Suvrit Sra
AU  - Adams Wei Yu
AU  - Mu Li
AU  - Alex Smola
BT  - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
DA  - 2016/05/02
ED  - Arthur Gretton
ED  - Christian C. Robert	
ID  - pmlr-v51-sra16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 51
SP  - 957
EP  - 965
L1  - http://proceedings.mlr.press/v51/sra16.pdf
UR  - https://proceedings.mlr.press/v51/sra16.html
AB  - We develop distributed stochastic convex optimization algorithms under a delayed gradient model in which server nodes update parameters and worker nodes compute stochastic (sub)gradients. Our setup is motivated by the behavior of real-world distributed computation systems; in particular, we analyze a setting wherein worker nodes can be differently slow at different times. In contrast to existing approaches, we do not impose a worst-case bound on the delays experienced but rather allow the updates to be sensitive to the actual delays experienced. This sensitivity allows use of larger stepsizes, which can help speed up initial convergence without having to wait too long for slower machines; the global convergence rate is still preserved. We experiment with different delay patterns, and obtain noticeable  improvements for large-scale real datasets with billions of examples and features.
ER  -

APA


Sra, S., Yu, A.W., Li, M. & Smola, A.. (2016). AdaDelay: Delay Adaptive Distributed Stochastic Optimization. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:957-965 Available from https://proceedings.mlr.press/v51/sra16.html.

AdaDelay: Delay Adaptive Distributed Stochastic Optimization

Abstract

Cite this Paper

Related Material