Efficient Collapsed Gibbs Sampling for Latent Dirichlet Allocation

Han Xiao, Thomas Stibor
Proceedings of 2nd Asian Conference on Machine Learning, PMLR 13:63-78, 2010.

Abstract

Collapsed Gibbs sampling is a frequently applied method to approximate intractable integrals in probabilistic generative models such as latent Dirichlet allocation. This sampling method has however the crucial drawback of high computational complexity, which makes it limited applicable on large data sets. We propose a novel dynamic sampling strategy to significantly improve the efficiency of collapsed Gibbs sampling. The strategy is explored in terms of efficiency, convergence and perplexity. Besides, we present a straight-forward parallelization to further improve the efficiency. Finally, we underpin our proposed improvements with a comparative study on different scale data sets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v13-xiao10a, title = {Efficient Collapsed Gibbs Sampling for Latent Dirichlet Allocation}, author = {Xiao, Han and Stibor, Thomas}, booktitle = {Proceedings of 2nd Asian Conference on Machine Learning}, pages = {63--78}, year = {2010}, editor = {Sugiyama, Masashi and Yang, Qiang}, volume = {13}, series = {Proceedings of Machine Learning Research}, address = {Tokyo, Japan}, month = {08--10 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v13/xiao10a/xiao10a.pdf}, url = {https://proceedings.mlr.press/v13/xiao10a.html}, abstract = {Collapsed Gibbs sampling is a frequently applied method to approximate intractable integrals in probabilistic generative models such as latent Dirichlet allocation. This sampling method has however the crucial drawback of high computational complexity, which makes it limited applicable on large data sets. We propose a novel dynamic sampling strategy to significantly improve the efficiency of collapsed Gibbs sampling. The strategy is explored in terms of efficiency, convergence and perplexity. Besides, we present a straight-forward parallelization to further improve the efficiency. Finally, we underpin our proposed improvements with a comparative study on different scale data sets.} }
Endnote
%0 Conference Paper %T Efficient Collapsed Gibbs Sampling for Latent Dirichlet Allocation %A Han Xiao %A Thomas Stibor %B Proceedings of 2nd Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2010 %E Masashi Sugiyama %E Qiang Yang %F pmlr-v13-xiao10a %I PMLR %P 63--78 %U https://proceedings.mlr.press/v13/xiao10a.html %V 13 %X Collapsed Gibbs sampling is a frequently applied method to approximate intractable integrals in probabilistic generative models such as latent Dirichlet allocation. This sampling method has however the crucial drawback of high computational complexity, which makes it limited applicable on large data sets. We propose a novel dynamic sampling strategy to significantly improve the efficiency of collapsed Gibbs sampling. The strategy is explored in terms of efficiency, convergence and perplexity. Besides, we present a straight-forward parallelization to further improve the efficiency. Finally, we underpin our proposed improvements with a comparative study on different scale data sets.
RIS
TY - CPAPER TI - Efficient Collapsed Gibbs Sampling for Latent Dirichlet Allocation AU - Han Xiao AU - Thomas Stibor BT - Proceedings of 2nd Asian Conference on Machine Learning DA - 2010/10/31 ED - Masashi Sugiyama ED - Qiang Yang ID - pmlr-v13-xiao10a PB - PMLR DP - Proceedings of Machine Learning Research VL - 13 SP - 63 EP - 78 L1 - http://proceedings.mlr.press/v13/xiao10a/xiao10a.pdf UR - https://proceedings.mlr.press/v13/xiao10a.html AB - Collapsed Gibbs sampling is a frequently applied method to approximate intractable integrals in probabilistic generative models such as latent Dirichlet allocation. This sampling method has however the crucial drawback of high computational complexity, which makes it limited applicable on large data sets. We propose a novel dynamic sampling strategy to significantly improve the efficiency of collapsed Gibbs sampling. The strategy is explored in terms of efficiency, convergence and perplexity. Besides, we present a straight-forward parallelization to further improve the efficiency. Finally, we underpin our proposed improvements with a comparative study on different scale data sets. ER -
APA
Xiao, H. & Stibor, T.. (2010). Efficient Collapsed Gibbs Sampling for Latent Dirichlet Allocation. Proceedings of 2nd Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 13:63-78 Available from https://proceedings.mlr.press/v13/xiao10a.html.

Related Material