Delay-Adaptive Step-sizes for Asynchronous Learning

Xuyang Wu; Sindri Magnusson; Hamid Reza Feyzmahdavian; Mikael Johansson

Delay-Adaptive Step-sizes for Asynchronous Learning

Xuyang Wu, Sindri Magnusson, Hamid Reza Feyzmahdavian, Mikael Johansson

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:24093-24113, 2022.

Abstract

In scalable machine learning systems, model training is often parallelized over multiple nodes that run without tight synchronization. Most analysis results for the related asynchronous algorithms use an upper bound on the information delays in the system to determine learning rates. Not only are such bounds hard to obtain in advance, but they also result in unnecessarily slow convergence. In this paper, we show that it is possible to use learning rates that depend on the actual time-varying delays in the system. We develop general convergence results for delay-adaptive asynchronous iterations and specialize these to proximal incremental gradient descent and block coordinate descent algorithms. For each of these methods, we demonstrate how delays can be measured on-line, present delay-adaptive step-size policies, and illustrate their theoretical and practical advantages over the state-of-the-art.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-wu22g,
  title = 	 {Delay-Adaptive Step-sizes for Asynchronous Learning},
  author =       {Wu, Xuyang and Magnusson, Sindri and Feyzmahdavian, Hamid Reza and Johansson, Mikael},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {24093--24113},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/wu22g/wu22g.pdf},
  url = 	 {https://proceedings.mlr.press/v162/wu22g.html},
  abstract = 	 {In scalable machine learning systems, model training is often parallelized over multiple nodes that run without tight synchronization. Most analysis results for the related asynchronous algorithms use an upper bound on the information delays in the system to determine learning rates. Not only are such bounds hard to obtain in advance, but they also result in unnecessarily slow convergence. In this paper, we show that it is possible to use learning rates that depend on the actual time-varying delays in the system. We develop general convergence results for delay-adaptive asynchronous iterations and specialize these to proximal incremental gradient descent and block coordinate descent algorithms. For each of these methods, we demonstrate how delays can be measured on-line, present delay-adaptive step-size policies, and illustrate their theoretical and practical advantages over the state-of-the-art.}
}

Endnote

%0 Conference Paper
%T Delay-Adaptive Step-sizes for Asynchronous Learning
%A Xuyang Wu
%A Sindri Magnusson
%A Hamid Reza Feyzmahdavian
%A Mikael Johansson
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-wu22g
%I PMLR
%P 24093--24113
%U https://proceedings.mlr.press/v162/wu22g.html
%V 162
%X In scalable machine learning systems, model training is often parallelized over multiple nodes that run without tight synchronization. Most analysis results for the related asynchronous algorithms use an upper bound on the information delays in the system to determine learning rates. Not only are such bounds hard to obtain in advance, but they also result in unnecessarily slow convergence. In this paper, we show that it is possible to use learning rates that depend on the actual time-varying delays in the system. We develop general convergence results for delay-adaptive asynchronous iterations and specialize these to proximal incremental gradient descent and block coordinate descent algorithms. For each of these methods, we demonstrate how delays can be measured on-line, present delay-adaptive step-size policies, and illustrate their theoretical and practical advantages over the state-of-the-art.

APA


Wu, X., Magnusson, S., Feyzmahdavian, H.R. & Johansson, M.. (2022). Delay-Adaptive Step-sizes for Asynchronous Learning. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:24093-24113 Available from https://proceedings.mlr.press/v162/wu22g.html.

Related Material

Download PDF