Distributed Stochastic Gradient MCMC

Sungjin Ahn, Babak Shahbaba, Max Welling
Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):1044-1052, 2014.

Abstract

Probabilistic inference on a big data scale is becoming increasingly relevant to both the machine learning and statistics communities. Here we introduce the first fully distributed MCMC algorithm based on stochastic gradients. We argue that stochastic gradient MCMC algorithms are particularly suited for distributed inference because individual chains can draw minibatches from their local pool of data for a flexible amount of time before jumping to or syncing with other chains. This greatly reduces communication overhead and allows adaptive load balancing. Our experiments for LDA on Wikipedia and Pubmed show that relative to the state of the art in distributed MCMC we reduce compute time from 27 hours to half an hour in order to reach the same perplexity level.

Cite this Paper


BibTeX
@InProceedings{pmlr-v32-ahn14, title = {Distributed Stochastic Gradient MCMC}, author = {Ahn, Sungjin and Shahbaba, Babak and Welling, Max}, booktitle = {Proceedings of the 31st International Conference on Machine Learning}, pages = {1044--1052}, year = {2014}, editor = {Xing, Eric P. and Jebara, Tony}, volume = {32}, number = {2}, series = {Proceedings of Machine Learning Research}, address = {Bejing, China}, month = {22--24 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v32/ahn14.pdf}, url = {https://proceedings.mlr.press/v32/ahn14.html}, abstract = {Probabilistic inference on a big data scale is becoming increasingly relevant to both the machine learning and statistics communities. Here we introduce the first fully distributed MCMC algorithm based on stochastic gradients. We argue that stochastic gradient MCMC algorithms are particularly suited for distributed inference because individual chains can draw minibatches from their local pool of data for a flexible amount of time before jumping to or syncing with other chains. This greatly reduces communication overhead and allows adaptive load balancing. Our experiments for LDA on Wikipedia and Pubmed show that relative to the state of the art in distributed MCMC we reduce compute time from 27 hours to half an hour in order to reach the same perplexity level.} }
Endnote
%0 Conference Paper %T Distributed Stochastic Gradient MCMC %A Sungjin Ahn %A Babak Shahbaba %A Max Welling %B Proceedings of the 31st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2014 %E Eric P. Xing %E Tony Jebara %F pmlr-v32-ahn14 %I PMLR %P 1044--1052 %U https://proceedings.mlr.press/v32/ahn14.html %V 32 %N 2 %X Probabilistic inference on a big data scale is becoming increasingly relevant to both the machine learning and statistics communities. Here we introduce the first fully distributed MCMC algorithm based on stochastic gradients. We argue that stochastic gradient MCMC algorithms are particularly suited for distributed inference because individual chains can draw minibatches from their local pool of data for a flexible amount of time before jumping to or syncing with other chains. This greatly reduces communication overhead and allows adaptive load balancing. Our experiments for LDA on Wikipedia and Pubmed show that relative to the state of the art in distributed MCMC we reduce compute time from 27 hours to half an hour in order to reach the same perplexity level.
RIS
TY - CPAPER TI - Distributed Stochastic Gradient MCMC AU - Sungjin Ahn AU - Babak Shahbaba AU - Max Welling BT - Proceedings of the 31st International Conference on Machine Learning DA - 2014/06/18 ED - Eric P. Xing ED - Tony Jebara ID - pmlr-v32-ahn14 PB - PMLR DP - Proceedings of Machine Learning Research VL - 32 IS - 2 SP - 1044 EP - 1052 L1 - http://proceedings.mlr.press/v32/ahn14.pdf UR - https://proceedings.mlr.press/v32/ahn14.html AB - Probabilistic inference on a big data scale is becoming increasingly relevant to both the machine learning and statistics communities. Here we introduce the first fully distributed MCMC algorithm based on stochastic gradients. We argue that stochastic gradient MCMC algorithms are particularly suited for distributed inference because individual chains can draw minibatches from their local pool of data for a flexible amount of time before jumping to or syncing with other chains. This greatly reduces communication overhead and allows adaptive load balancing. Our experiments for LDA on Wikipedia and Pubmed show that relative to the state of the art in distributed MCMC we reduce compute time from 27 hours to half an hour in order to reach the same perplexity level. ER -
APA
Ahn, S., Shahbaba, B. & Welling, M.. (2014). Distributed Stochastic Gradient MCMC. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):1044-1052 Available from https://proceedings.mlr.press/v32/ahn14.html.

Related Material