Fugue: Slow-Worker-Agnostic Distributed Learning for Big Models on Big Data

Abhimanu Kumar; Alex Beutel; Qirong Ho; Eric Xing

Fugue: Slow-Worker-Agnostic Distributed Learning for Big Models on Big Data

Abhimanu Kumar, Alex Beutel, Qirong Ho, Eric Xing

Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, PMLR 33:531-539, 2014.

Abstract

We present a scheme for fast, distributed learning on big (i.e. high-dimensional) models applied to big datasets. Unlike algorithms that focus on distributed learning in either the big data or big model setting (but not both), our scheme partitions both the data and model variables simultaneously. This not only leads to faster learning on distributed clusters, but also enables machine learning applications where both data and model are too large to fit within the memory of a single machine. Furthermore, our scheme allows worker machines to perform additional updates while waiting for slow workers to finish, which provides users with a tunable synchronization strategy that can be set based on learning needs and cluster conditions. We prove the correctness of such strategies, as well as provide bounds on the variance of the model variables under our scheme. Finally, we present empirical results for latent space models such as topic models, which demonstrate that our method scales well with large data and model sizes, while beating learning strategies that fail to take both data and model partitioning into account.

Cite this Paper

BibTeX


@InProceedings{pmlr-v33-kumar14,
  title = 	 {{Fugue: Slow-Worker-Agnostic Distributed Learning for Big Models on Big Data}},
  author = 	 {Kumar, Abhimanu and Beutel, Alex and Ho, Qirong and Xing, Eric},
  booktitle = 	 {Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {531--539},
  year = 	 {2014},
  editor = 	 {Kaski, Samuel and Corander, Jukka},
  volume = 	 {33},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Reykjavik, Iceland},
  month = 	 {22--25 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v33/kumar14.pdf},
  url = 	 {https://proceedings.mlr.press/v33/kumar14.html},
  abstract = 	 {We present a scheme for fast, distributed learning on  big (i.e. high-dimensional) models applied to big datasets.  Unlike algorithms that focus on distributed learning in either the big data or big model setting  (but not both), our scheme partitions both the data and model variables  simultaneously. This not only leads to faster learning on distributed clusters,  but also enables machine learning applications where both data  and model are too large to fit within the memory of a single machine. Furthermore, our scheme  allows worker machines to perform additional updates while waiting for slow workers to finish,  which provides users with a tunable synchronization strategy that can  be set based on learning needs and cluster conditions.  We prove the correctness of such strategies, as well as provide  bounds on the variance of the model variables under our scheme.  Finally, we present empirical results for latent space models such  as topic models, which demonstrate that our method  scales well with large data and model sizes, while beating  learning strategies that fail to take both data and model partitioning into account.}
}

Endnote

%0 Conference Paper
%T Fugue: Slow-Worker-Agnostic Distributed Learning for Big Models on Big Data
%A Abhimanu Kumar
%A Alex Beutel
%A Qirong Ho
%A Eric Xing
%B Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2014
%E Samuel Kaski
%E Jukka Corander	
%F pmlr-v33-kumar14
%I PMLR
%P 531--539
%U https://proceedings.mlr.press/v33/kumar14.html
%V 33
%X We present a scheme for fast, distributed learning on  big (i.e. high-dimensional) models applied to big datasets.  Unlike algorithms that focus on distributed learning in either the big data or big model setting  (but not both), our scheme partitions both the data and model variables  simultaneously. This not only leads to faster learning on distributed clusters,  but also enables machine learning applications where both data  and model are too large to fit within the memory of a single machine. Furthermore, our scheme  allows worker machines to perform additional updates while waiting for slow workers to finish,  which provides users with a tunable synchronization strategy that can  be set based on learning needs and cluster conditions.  We prove the correctness of such strategies, as well as provide  bounds on the variance of the model variables under our scheme.  Finally, we present empirical results for latent space models such  as topic models, which demonstrate that our method  scales well with large data and model sizes, while beating  learning strategies that fail to take both data and model partitioning into account.

RIS


TY  - CPAPER
TI  - Fugue: Slow-Worker-Agnostic Distributed Learning for Big Models on Big Data
AU  - Abhimanu Kumar
AU  - Alex Beutel
AU  - Qirong Ho
AU  - Eric Xing
BT  - Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics
DA  - 2014/04/02
ED  - Samuel Kaski
ED  - Jukka Corander	
ID  - pmlr-v33-kumar14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 33
SP  - 531
EP  - 539
L1  - http://proceedings.mlr.press/v33/kumar14.pdf
UR  - https://proceedings.mlr.press/v33/kumar14.html
AB  - We present a scheme for fast, distributed learning on  big (i.e. high-dimensional) models applied to big datasets.  Unlike algorithms that focus on distributed learning in either the big data or big model setting  (but not both), our scheme partitions both the data and model variables  simultaneously. This not only leads to faster learning on distributed clusters,  but also enables machine learning applications where both data  and model are too large to fit within the memory of a single machine. Furthermore, our scheme  allows worker machines to perform additional updates while waiting for slow workers to finish,  which provides users with a tunable synchronization strategy that can  be set based on learning needs and cluster conditions.  We prove the correctness of such strategies, as well as provide  bounds on the variance of the model variables under our scheme.  Finally, we present empirical results for latent space models such  as topic models, which demonstrate that our method  scales well with large data and model sizes, while beating  learning strategies that fail to take both data and model partitioning into account.
ER  -

APA


Kumar, A., Beutel, A., Ho, Q. & Xing, E.. (2014). Fugue: Slow-Worker-Agnostic Distributed Learning for Big Models on Big Data. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 33:531-539 Available from https://proceedings.mlr.press/v33/kumar14.html.

Fugue: Slow-Worker-Agnostic Distributed Learning for Big Models on Big Data

Abstract

Cite this Paper

Related Material