On Convergence of Model Parallel Proximal Gradient Algorithm for Stale Synchronous Parallel System

Yi Zhou; Yaoliang Yu; Wei Dai; Yingbin Liang; Eric Xing

On Convergence of Model Parallel Proximal Gradient Algorithm for Stale Synchronous Parallel System

Yi Zhou, Yaoliang Yu, Wei Dai, Yingbin Liang, Eric Xing

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:713-722, 2016.

Abstract

With ever growing data volume and model size, an error-tolerant, communication efficient, yet versatile parallel algorithm has become a vital part for the success of many large-scale applications. In this work we propose mspg, an extension of the flexible proximal gradient algorithm to the model parallel and stale synchronous setting. The worker machines of mspg operate asynchronously as long as they are not too far apart, and they communicate efficiently through a dedicated parameter server. Theoretically, we provide a rigorous analysis of the various convergence properties of mspg, and a salient feature of our analysis is its seamless generality that allows both nonsmooth and nonconvex functions. Under mild conditions, we prove the whole iterate sequence of mspg converges to a critical point (which is optimal under convexity assumptions). We further provide an economical implementation of mspg, completely bypassing the need of keeping a local full model. We confirm our theoretical findings through numerical experiments.

Cite this Paper

BibTeX


@InProceedings{pmlr-v51-zhou16,
  title = 	 {On Convergence of Model Parallel Proximal Gradient Algorithm for Stale Synchronous Parallel System},
  author = 	 {Zhou, Yi and Yu, Yaoliang and Dai, Wei and Liang, Yingbin and Xing, Eric},
  booktitle = 	 {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {713--722},
  year = 	 {2016},
  editor = 	 {Gretton, Arthur and Robert, Christian C.},
  volume = 	 {51},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Cadiz, Spain},
  month = 	 {09--11 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v51/zhou16.pdf},
  url = 	 {https://proceedings.mlr.press/v51/zhou16.html},
  abstract = 	 {With ever growing data volume and model size, an error-tolerant, communication efficient, yet versatile parallel algorithm has become a vital part for the success of many large-scale applications. In this work we propose mspg, an extension of the flexible proximal gradient algorithm to the model parallel and stale synchronous setting. The worker machines of mspg operate asynchronously as long as they are not too far apart, and they communicate efficiently through a dedicated parameter server. Theoretically, we provide a rigorous analysis of the various convergence properties of mspg, and a salient feature of our analysis is its seamless generality that allows both nonsmooth and nonconvex functions. Under mild conditions, we prove the whole iterate sequence of mspg converges to a critical point (which is optimal under convexity assumptions). We further provide an economical implementation of mspg, completely bypassing the need of keeping a local full model. We confirm our theoretical findings through numerical experiments.}
}

Endnote

%0 Conference Paper
%T On Convergence of Model Parallel Proximal Gradient Algorithm for Stale Synchronous Parallel System
%A Yi Zhou
%A Yaoliang Yu
%A Wei Dai
%A Yingbin Liang
%A Eric Xing
%B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2016
%E Arthur Gretton
%E Christian C. Robert	
%F pmlr-v51-zhou16
%I PMLR
%P 713--722
%U https://proceedings.mlr.press/v51/zhou16.html
%V 51
%X With ever growing data volume and model size, an error-tolerant, communication efficient, yet versatile parallel algorithm has become a vital part for the success of many large-scale applications. In this work we propose mspg, an extension of the flexible proximal gradient algorithm to the model parallel and stale synchronous setting. The worker machines of mspg operate asynchronously as long as they are not too far apart, and they communicate efficiently through a dedicated parameter server. Theoretically, we provide a rigorous analysis of the various convergence properties of mspg, and a salient feature of our analysis is its seamless generality that allows both nonsmooth and nonconvex functions. Under mild conditions, we prove the whole iterate sequence of mspg converges to a critical point (which is optimal under convexity assumptions). We further provide an economical implementation of mspg, completely bypassing the need of keeping a local full model. We confirm our theoretical findings through numerical experiments.

RIS


TY  - CPAPER
TI  - On Convergence of Model Parallel Proximal Gradient Algorithm for Stale Synchronous Parallel System
AU  - Yi Zhou
AU  - Yaoliang Yu
AU  - Wei Dai
AU  - Yingbin Liang
AU  - Eric Xing
BT  - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
DA  - 2016/05/02
ED  - Arthur Gretton
ED  - Christian C. Robert	
ID  - pmlr-v51-zhou16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 51
SP  - 713
EP  - 722
L1  - http://proceedings.mlr.press/v51/zhou16.pdf
UR  - https://proceedings.mlr.press/v51/zhou16.html
AB  - With ever growing data volume and model size, an error-tolerant, communication efficient, yet versatile parallel algorithm has become a vital part for the success of many large-scale applications. In this work we propose mspg, an extension of the flexible proximal gradient algorithm to the model parallel and stale synchronous setting. The worker machines of mspg operate asynchronously as long as they are not too far apart, and they communicate efficiently through a dedicated parameter server. Theoretically, we provide a rigorous analysis of the various convergence properties of mspg, and a salient feature of our analysis is its seamless generality that allows both nonsmooth and nonconvex functions. Under mild conditions, we prove the whole iterate sequence of mspg converges to a critical point (which is optimal under convexity assumptions). We further provide an economical implementation of mspg, completely bypassing the need of keeping a local full model. We confirm our theoretical findings through numerical experiments.
ER  -

APA


Zhou, Y., Yu, Y., Dai, W., Liang, Y. & Xing, E.. (2016). On Convergence of Model Parallel Proximal Gradient Algorithm for Stale Synchronous Parallel System. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:713-722 Available from https://proceedings.mlr.press/v51/zhou16.html.

On Convergence of Model Parallel Proximal Gradient Algorithm for Stale Synchronous Parallel System

Abstract

Cite this Paper

Related Material