On Convergence of Model Parallel Proximal Gradient Algorithm for Stale Synchronous Parallel System

Yi Zhou, Yaoliang Yu, Wei Dai, Yingbin Liang, Eric Xing
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:713-722, 2016.

Abstract

With ever growing data volume and model size, an error-tolerant, communication efficient, yet versatile parallel algorithm has become a vital part for the success of many large-scale applications. In this work we propose mspg, an extension of the flexible proximal gradient algorithm to the model parallel and stale synchronous setting. The worker machines of mspg operate asynchronously as long as they are not too far apart, and they communicate efficiently through a dedicated parameter server. Theoretically, we provide a rigorous analysis of the various convergence properties of mspg, and a salient feature of our analysis is its seamless generality that allows both nonsmooth and nonconvex functions. Under mild conditions, we prove the whole iterate sequence of mspg converges to a critical point (which is optimal under convexity assumptions). We further provide an economical implementation of mspg, completely bypassing the need of keeping a local full model. We confirm our theoretical findings through numerical experiments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v51-zhou16, title = {On Convergence of Model Parallel Proximal Gradient Algorithm for Stale Synchronous Parallel System}, author = {Zhou, Yi and Yu, Yaoliang and Dai, Wei and Liang, Yingbin and Xing, Eric}, booktitle = {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics}, pages = {713--722}, year = {2016}, editor = {Gretton, Arthur and Robert, Christian C.}, volume = {51}, series = {Proceedings of Machine Learning Research}, address = {Cadiz, Spain}, month = {09--11 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v51/zhou16.pdf}, url = {https://proceedings.mlr.press/v51/zhou16.html}, abstract = {With ever growing data volume and model size, an error-tolerant, communication efficient, yet versatile parallel algorithm has become a vital part for the success of many large-scale applications. In this work we propose mspg, an extension of the flexible proximal gradient algorithm to the model parallel and stale synchronous setting. The worker machines of mspg operate asynchronously as long as they are not too far apart, and they communicate efficiently through a dedicated parameter server. Theoretically, we provide a rigorous analysis of the various convergence properties of mspg, and a salient feature of our analysis is its seamless generality that allows both nonsmooth and nonconvex functions. Under mild conditions, we prove the whole iterate sequence of mspg converges to a critical point (which is optimal under convexity assumptions). We further provide an economical implementation of mspg, completely bypassing the need of keeping a local full model. We confirm our theoretical findings through numerical experiments.} }
Endnote
%0 Conference Paper %T On Convergence of Model Parallel Proximal Gradient Algorithm for Stale Synchronous Parallel System %A Yi Zhou %A Yaoliang Yu %A Wei Dai %A Yingbin Liang %A Eric Xing %B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2016 %E Arthur Gretton %E Christian C. Robert %F pmlr-v51-zhou16 %I PMLR %P 713--722 %U https://proceedings.mlr.press/v51/zhou16.html %V 51 %X With ever growing data volume and model size, an error-tolerant, communication efficient, yet versatile parallel algorithm has become a vital part for the success of many large-scale applications. In this work we propose mspg, an extension of the flexible proximal gradient algorithm to the model parallel and stale synchronous setting. The worker machines of mspg operate asynchronously as long as they are not too far apart, and they communicate efficiently through a dedicated parameter server. Theoretically, we provide a rigorous analysis of the various convergence properties of mspg, and a salient feature of our analysis is its seamless generality that allows both nonsmooth and nonconvex functions. Under mild conditions, we prove the whole iterate sequence of mspg converges to a critical point (which is optimal under convexity assumptions). We further provide an economical implementation of mspg, completely bypassing the need of keeping a local full model. We confirm our theoretical findings through numerical experiments.
RIS
TY - CPAPER TI - On Convergence of Model Parallel Proximal Gradient Algorithm for Stale Synchronous Parallel System AU - Yi Zhou AU - Yaoliang Yu AU - Wei Dai AU - Yingbin Liang AU - Eric Xing BT - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics DA - 2016/05/02 ED - Arthur Gretton ED - Christian C. Robert ID - pmlr-v51-zhou16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 51 SP - 713 EP - 722 L1 - http://proceedings.mlr.press/v51/zhou16.pdf UR - https://proceedings.mlr.press/v51/zhou16.html AB - With ever growing data volume and model size, an error-tolerant, communication efficient, yet versatile parallel algorithm has become a vital part for the success of many large-scale applications. In this work we propose mspg, an extension of the flexible proximal gradient algorithm to the model parallel and stale synchronous setting. The worker machines of mspg operate asynchronously as long as they are not too far apart, and they communicate efficiently through a dedicated parameter server. Theoretically, we provide a rigorous analysis of the various convergence properties of mspg, and a salient feature of our analysis is its seamless generality that allows both nonsmooth and nonconvex functions. Under mild conditions, we prove the whole iterate sequence of mspg converges to a critical point (which is optimal under convexity assumptions). We further provide an economical implementation of mspg, completely bypassing the need of keeping a local full model. We confirm our theoretical findings through numerical experiments. ER -
APA
Zhou, Y., Yu, Y., Dai, W., Liang, Y. & Xing, E.. (2016). On Convergence of Model Parallel Proximal Gradient Algorithm for Stale Synchronous Parallel System. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:713-722 Available from https://proceedings.mlr.press/v51/zhou16.html.

Related Material