Asynchronous Decentralized Optimization With Implicit Stochastic Variance Reduction

Kenta Niwa; Guoqiang Zhang; W. Bastiaan Kleijn; Noboru Harada; Hiroshi Sawada; Akinori Fujino

Asynchronous Decentralized Optimization With Implicit Stochastic Variance Reduction

Kenta Niwa, Guoqiang Zhang, W. Bastiaan Kleijn, Noboru Harada, Hiroshi Sawada, Akinori Fujino

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:8195-8204, 2021.

Abstract

A novel asynchronous decentralized optimization method that follows Stochastic Variance Reduction (SVR) is proposed. Average consensus algorithms, such as Decentralized Stochastic Gradient Descent (DSGD), facilitate distributed training of machine learning models. However, the gradient will drift within the local nodes due to statistical heterogeneity of the subsets of data residing on the nodes and long communication intervals. To overcome the drift problem, (i) Gradient Tracking-SVR (GT-SVR) integrates SVR into DSGD and (ii) Edge-Consensus Learning (ECL) solves a model constrained minimization problem using a primal-dual formalism. In this paper, we reformulate the update procedure of ECL such that it implicitly includes the gradient modification of SVR by optimally selecting a constraint-strength control parameter. Through convergence analysis and experiments, we confirmed that the proposed ECL with Implicit SVR (ECL-ISVR) is stable and approximately reaches the reference performance obtained with computation on a single-node using full data set.

Cite this Paper

BibTeX


@InProceedings{pmlr-v139-niwa21a,
  title = 	 {Asynchronous Decentralized Optimization With Implicit Stochastic Variance Reduction},
  author =       {Niwa, Kenta and Zhang, Guoqiang and Kleijn, W. Bastiaan and Harada, Noboru and Sawada, Hiroshi and Fujino, Akinori},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {8195--8204},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/niwa21a/niwa21a.pdf},
  url = 	 {https://proceedings.mlr.press/v139/niwa21a.html},
  abstract = 	 {A novel asynchronous decentralized optimization method that follows Stochastic Variance Reduction (SVR) is proposed. Average consensus algorithms, such as Decentralized Stochastic Gradient Descent (DSGD), facilitate distributed training of machine learning models. However, the gradient will drift within the local nodes due to statistical heterogeneity of the subsets of data residing on the nodes and long communication intervals. To overcome the drift problem, (i) Gradient Tracking-SVR (GT-SVR) integrates SVR into DSGD and (ii) Edge-Consensus Learning (ECL) solves a model constrained minimization problem using a primal-dual formalism. In this paper, we reformulate the update procedure of ECL such that it implicitly includes the gradient modification of SVR by optimally selecting a constraint-strength control parameter. Through convergence analysis and experiments, we confirmed that the proposed ECL with Implicit SVR (ECL-ISVR) is stable and approximately reaches the reference performance obtained with computation on a single-node using full data set.}
}

Endnote

%0 Conference Paper
%T Asynchronous Decentralized Optimization With Implicit Stochastic Variance Reduction
%A Kenta Niwa
%A Guoqiang Zhang
%A W. Bastiaan Kleijn
%A Noboru Harada
%A Hiroshi Sawada
%A Akinori Fujino
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-niwa21a
%I PMLR
%P 8195--8204
%U https://proceedings.mlr.press/v139/niwa21a.html
%V 139
%X A novel asynchronous decentralized optimization method that follows Stochastic Variance Reduction (SVR) is proposed. Average consensus algorithms, such as Decentralized Stochastic Gradient Descent (DSGD), facilitate distributed training of machine learning models. However, the gradient will drift within the local nodes due to statistical heterogeneity of the subsets of data residing on the nodes and long communication intervals. To overcome the drift problem, (i) Gradient Tracking-SVR (GT-SVR) integrates SVR into DSGD and (ii) Edge-Consensus Learning (ECL) solves a model constrained minimization problem using a primal-dual formalism. In this paper, we reformulate the update procedure of ECL such that it implicitly includes the gradient modification of SVR by optimally selecting a constraint-strength control parameter. Through convergence analysis and experiments, we confirmed that the proposed ECL with Implicit SVR (ECL-ISVR) is stable and approximately reaches the reference performance obtained with computation on a single-node using full data set.

APA


Niwa, K., Zhang, G., Kleijn, W.B., Harada, N., Sawada, H. & Fujino, A.. (2021). Asynchronous Decentralized Optimization With Implicit Stochastic Variance Reduction. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:8195-8204 Available from https://proceedings.mlr.press/v139/niwa21a.html.

Asynchronous Decentralized Optimization With Implicit Stochastic Variance Reduction

Abstract

Cite this Paper

Related Material