Fugue: Slow-Worker-Agnostic Distributed Learning for Big Models on Big Data

Abhimanu Kumar, Alex Beutel, Qirong Ho, Eric Xing
Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, PMLR 33:531-539, 2014.

Abstract

We present a scheme for fast, distributed learning on big (i.e. high-dimensional) models applied to big datasets. Unlike algorithms that focus on distributed learning in either the big data or big model setting (but not both), our scheme partitions both the data and model variables simultaneously. This not only leads to faster learning on distributed clusters, but also enables machine learning applications where both data and model are too large to fit within the memory of a single machine. Furthermore, our scheme allows worker machines to perform additional updates while waiting for slow workers to finish, which provides users with a tunable synchronization strategy that can be set based on learning needs and cluster conditions. We prove the correctness of such strategies, as well as provide bounds on the variance of the model variables under our scheme. Finally, we present empirical results for latent space models such as topic models, which demonstrate that our method scales well with large data and model sizes, while beating learning strategies that fail to take both data and model partitioning into account.

Cite this Paper


BibTeX
@InProceedings{pmlr-v33-kumar14, title = {{Fugue: Slow-Worker-Agnostic Distributed Learning for Big Models on Big Data}}, author = {Abhimanu Kumar and Alex Beutel and Qirong Ho and Eric Xing}, booktitle = {Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics}, pages = {531--539}, year = {2014}, editor = {Samuel Kaski and Jukka Corander}, volume = {33}, series = {Proceedings of Machine Learning Research}, address = {Reykjavik, Iceland}, month = {22--25 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v33/kumar14.pdf}, url = {http://proceedings.mlr.press/v33/kumar14.html}, abstract = {We present a scheme for fast, distributed learning on big (i.e. high-dimensional) models applied to big datasets. Unlike algorithms that focus on distributed learning in either the big data or big model setting (but not both), our scheme partitions both the data and model variables simultaneously. This not only leads to faster learning on distributed clusters, but also enables machine learning applications where both data and model are too large to fit within the memory of a single machine. Furthermore, our scheme allows worker machines to perform additional updates while waiting for slow workers to finish, which provides users with a tunable synchronization strategy that can be set based on learning needs and cluster conditions. We prove the correctness of such strategies, as well as provide bounds on the variance of the model variables under our scheme. Finally, we present empirical results for latent space models such as topic models, which demonstrate that our method scales well with large data and model sizes, while beating learning strategies that fail to take both data and model partitioning into account.} }
Endnote
%0 Conference Paper %T Fugue: Slow-Worker-Agnostic Distributed Learning for Big Models on Big Data %A Abhimanu Kumar %A Alex Beutel %A Qirong Ho %A Eric Xing %B Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2014 %E Samuel Kaski %E Jukka Corander %F pmlr-v33-kumar14 %I PMLR %P 531--539 %U http://proceedings.mlr.press/v33/kumar14.html %V 33 %X We present a scheme for fast, distributed learning on big (i.e. high-dimensional) models applied to big datasets. Unlike algorithms that focus on distributed learning in either the big data or big model setting (but not both), our scheme partitions both the data and model variables simultaneously. This not only leads to faster learning on distributed clusters, but also enables machine learning applications where both data and model are too large to fit within the memory of a single machine. Furthermore, our scheme allows worker machines to perform additional updates while waiting for slow workers to finish, which provides users with a tunable synchronization strategy that can be set based on learning needs and cluster conditions. We prove the correctness of such strategies, as well as provide bounds on the variance of the model variables under our scheme. Finally, we present empirical results for latent space models such as topic models, which demonstrate that our method scales well with large data and model sizes, while beating learning strategies that fail to take both data and model partitioning into account.
RIS
TY - CPAPER TI - Fugue: Slow-Worker-Agnostic Distributed Learning for Big Models on Big Data AU - Abhimanu Kumar AU - Alex Beutel AU - Qirong Ho AU - Eric Xing BT - Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics DA - 2014/04/02 ED - Samuel Kaski ED - Jukka Corander ID - pmlr-v33-kumar14 PB - PMLR DP - Proceedings of Machine Learning Research VL - 33 SP - 531 EP - 539 L1 - http://proceedings.mlr.press/v33/kumar14.pdf UR - http://proceedings.mlr.press/v33/kumar14.html AB - We present a scheme for fast, distributed learning on big (i.e. high-dimensional) models applied to big datasets. Unlike algorithms that focus on distributed learning in either the big data or big model setting (but not both), our scheme partitions both the data and model variables simultaneously. This not only leads to faster learning on distributed clusters, but also enables machine learning applications where both data and model are too large to fit within the memory of a single machine. Furthermore, our scheme allows worker machines to perform additional updates while waiting for slow workers to finish, which provides users with a tunable synchronization strategy that can be set based on learning needs and cluster conditions. We prove the correctness of such strategies, as well as provide bounds on the variance of the model variables under our scheme. Finally, we present empirical results for latent space models such as topic models, which demonstrate that our method scales well with large data and model sizes, while beating learning strategies that fail to take both data and model partitioning into account. ER -
APA
Kumar, A., Beutel, A., Ho, Q. & Xing, E.. (2014). Fugue: Slow-Worker-Agnostic Distributed Learning for Big Models on Big Data. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 33:531-539 Available from http://proceedings.mlr.press/v33/kumar14.html.

Related Material