On Convergence of Model Parallel Proximal Gradient Algorithm for Stale Synchronous Parallel System

Yi Zhou, Yaoliang Yu, Wei Dai, Yingbin Liang, Eric Xing
; Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:713-722, 2016.

Abstract

With ever growing data volume and model size, an error-tolerant, communication efficient, yet versatile parallel algorithm has become a vital part for the success of many large-scale applications. In this work we propose mspg, an extension of the flexible proximal gradient algorithm to the model parallel and stale synchronous setting. The worker machines of mspg operate asynchronously as long as they are not too far apart, and they communicate efficiently through a dedicated parameter server. Theoretically, we provide a rigorous analysis of the various convergence properties of mspg, and a salient feature of our analysis is its seamless generality that allows both nonsmooth and nonconvex functions. Under mild conditions, we prove the whole iterate sequence of mspg converges to a critical point (which is optimal under convexity assumptions). We further provide an economical implementation of mspg, completely bypassing the need of keeping a local full model. We confirm our theoretical findings through numerical experiments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v51-zhou16, title = {On Convergence of Model Parallel Proximal Gradient Algorithm for Stale Synchronous Parallel System}, author = {Yi Zhou and Yaoliang Yu and Wei Dai and Yingbin Liang and Eric Xing}, booktitle = {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics}, pages = {713--722}, year = {2016}, editor = {Arthur Gretton and Christian C. Robert}, volume = {51}, series = {Proceedings of Machine Learning Research}, address = {Cadiz, Spain}, month = {09--11 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v51/zhou16.pdf}, url = {http://proceedings.mlr.press/v51/zhou16.html}, abstract = {With ever growing data volume and model size, an error-tolerant, communication efficient, yet versatile parallel algorithm has become a vital part for the success of many large-scale applications. In this work we propose mspg, an extension of the flexible proximal gradient algorithm to the model parallel and stale synchronous setting. The worker machines of mspg operate asynchronously as long as they are not too far apart, and they communicate efficiently through a dedicated parameter server. Theoretically, we provide a rigorous analysis of the various convergence properties of mspg, and a salient feature of our analysis is its seamless generality that allows both nonsmooth and nonconvex functions. Under mild conditions, we prove the whole iterate sequence of mspg converges to a critical point (which is optimal under convexity assumptions). We further provide an economical implementation of mspg, completely bypassing the need of keeping a local full model. We confirm our theoretical findings through numerical experiments.} }
Endnote
%0 Conference Paper %T On Convergence of Model Parallel Proximal Gradient Algorithm for Stale Synchronous Parallel System %A Yi Zhou %A Yaoliang Yu %A Wei Dai %A Yingbin Liang %A Eric Xing %B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2016 %E Arthur Gretton %E Christian C. Robert %F pmlr-v51-zhou16 %I PMLR %J Proceedings of Machine Learning Research %P 713--722 %U http://proceedings.mlr.press %V 51 %W PMLR %X With ever growing data volume and model size, an error-tolerant, communication efficient, yet versatile parallel algorithm has become a vital part for the success of many large-scale applications. In this work we propose mspg, an extension of the flexible proximal gradient algorithm to the model parallel and stale synchronous setting. The worker machines of mspg operate asynchronously as long as they are not too far apart, and they communicate efficiently through a dedicated parameter server. Theoretically, we provide a rigorous analysis of the various convergence properties of mspg, and a salient feature of our analysis is its seamless generality that allows both nonsmooth and nonconvex functions. Under mild conditions, we prove the whole iterate sequence of mspg converges to a critical point (which is optimal under convexity assumptions). We further provide an economical implementation of mspg, completely bypassing the need of keeping a local full model. We confirm our theoretical findings through numerical experiments.
RIS
TY - CPAPER TI - On Convergence of Model Parallel Proximal Gradient Algorithm for Stale Synchronous Parallel System AU - Yi Zhou AU - Yaoliang Yu AU - Wei Dai AU - Yingbin Liang AU - Eric Xing BT - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics PY - 2016/05/02 DA - 2016/05/02 ED - Arthur Gretton ED - Christian C. Robert ID - pmlr-v51-zhou16 PB - PMLR SP - 713 DP - PMLR EP - 722 L1 - http://proceedings.mlr.press/v51/zhou16.pdf UR - http://proceedings.mlr.press/v51/zhou16.html AB - With ever growing data volume and model size, an error-tolerant, communication efficient, yet versatile parallel algorithm has become a vital part for the success of many large-scale applications. In this work we propose mspg, an extension of the flexible proximal gradient algorithm to the model parallel and stale synchronous setting. The worker machines of mspg operate asynchronously as long as they are not too far apart, and they communicate efficiently through a dedicated parameter server. Theoretically, we provide a rigorous analysis of the various convergence properties of mspg, and a salient feature of our analysis is its seamless generality that allows both nonsmooth and nonconvex functions. Under mild conditions, we prove the whole iterate sequence of mspg converges to a critical point (which is optimal under convexity assumptions). We further provide an economical implementation of mspg, completely bypassing the need of keeping a local full model. We confirm our theoretical findings through numerical experiments. ER -
APA
Zhou, Y., Yu, Y., Dai, W., Liang, Y. & Xing, E.. (2016). On Convergence of Model Parallel Proximal Gradient Algorithm for Stale Synchronous Parallel System. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in PMLR 51:713-722

Related Material