Dual online inference for latent Dirichlet allocation

Khoat Than, Tung Doan
Proceedings of the Sixth Asian Conference on Machine Learning, PMLR 39:80-95, 2015.

Abstract

Latent Dirichlet allocation (LDA) provides an efficient tool to analyze very large text collections. In this paper, we discuss three novel contributions: (1) a proof for the tractability of the MAP estimation of topic mixtures under certain conditions that might fit well with practices, even though the problem is known to be intractable in the worst case; (2) a provably fast algorithm (OFW) for inferring topic mixtures; (3) a dual online algorithm (DOLDA) for learning LDA at a large scale. We show that OFW converges to some local optima, but under certain conditions it can converge to global optima. The discussion of OFW is very general and hence can be readily employed to accelerate the MAP estimation in a wide class of probabilistic models. From extensive experiments we find that DOLDA can achieve significantly better predictive performance and more interpretable topics, with lower runtime, than stochastic variational inference. Further, DOLDA enables us to easily analyze text streams or millions of documents.

Cite this Paper


BibTeX
@InProceedings{pmlr-v39-than14, title = {Dual online inference for latent {D}irichlet allocation}, author = {Than, Khoat and Doan, Tung}, booktitle = {Proceedings of the Sixth Asian Conference on Machine Learning}, pages = {80--95}, year = {2015}, editor = {Phung, Dinh and Li, Hang}, volume = {39}, series = {Proceedings of Machine Learning Research}, address = {Nha Trang City, Vietnam}, month = {26--28 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v39/than14.pdf}, url = {https://proceedings.mlr.press/v39/than14.html}, abstract = {Latent Dirichlet allocation (LDA) provides an efficient tool to analyze very large text collections. In this paper, we discuss three novel contributions: (1) a proof for the tractability of the MAP estimation of topic mixtures under certain conditions that might fit well with practices, even though the problem is known to be intractable in the worst case; (2) a provably fast algorithm (OFW) for inferring topic mixtures; (3) a dual online algorithm (DOLDA) for learning LDA at a large scale. We show that OFW converges to some local optima, but under certain conditions it can converge to global optima. The discussion of OFW is very general and hence can be readily employed to accelerate the MAP estimation in a wide class of probabilistic models. From extensive experiments we find that DOLDA can achieve significantly better predictive performance and more interpretable topics, with lower runtime, than stochastic variational inference. Further, DOLDA enables us to easily analyze text streams or millions of documents.} }
Endnote
%0 Conference Paper %T Dual online inference for latent Dirichlet allocation %A Khoat Than %A Tung Doan %B Proceedings of the Sixth Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2015 %E Dinh Phung %E Hang Li %F pmlr-v39-than14 %I PMLR %P 80--95 %U https://proceedings.mlr.press/v39/than14.html %V 39 %X Latent Dirichlet allocation (LDA) provides an efficient tool to analyze very large text collections. In this paper, we discuss three novel contributions: (1) a proof for the tractability of the MAP estimation of topic mixtures under certain conditions that might fit well with practices, even though the problem is known to be intractable in the worst case; (2) a provably fast algorithm (OFW) for inferring topic mixtures; (3) a dual online algorithm (DOLDA) for learning LDA at a large scale. We show that OFW converges to some local optima, but under certain conditions it can converge to global optima. The discussion of OFW is very general and hence can be readily employed to accelerate the MAP estimation in a wide class of probabilistic models. From extensive experiments we find that DOLDA can achieve significantly better predictive performance and more interpretable topics, with lower runtime, than stochastic variational inference. Further, DOLDA enables us to easily analyze text streams or millions of documents.
RIS
TY - CPAPER TI - Dual online inference for latent Dirichlet allocation AU - Khoat Than AU - Tung Doan BT - Proceedings of the Sixth Asian Conference on Machine Learning DA - 2015/02/16 ED - Dinh Phung ED - Hang Li ID - pmlr-v39-than14 PB - PMLR DP - Proceedings of Machine Learning Research VL - 39 SP - 80 EP - 95 L1 - http://proceedings.mlr.press/v39/than14.pdf UR - https://proceedings.mlr.press/v39/than14.html AB - Latent Dirichlet allocation (LDA) provides an efficient tool to analyze very large text collections. In this paper, we discuss three novel contributions: (1) a proof for the tractability of the MAP estimation of topic mixtures under certain conditions that might fit well with practices, even though the problem is known to be intractable in the worst case; (2) a provably fast algorithm (OFW) for inferring topic mixtures; (3) a dual online algorithm (DOLDA) for learning LDA at a large scale. We show that OFW converges to some local optima, but under certain conditions it can converge to global optima. The discussion of OFW is very general and hence can be readily employed to accelerate the MAP estimation in a wide class of probabilistic models. From extensive experiments we find that DOLDA can achieve significantly better predictive performance and more interpretable topics, with lower runtime, than stochastic variational inference. Further, DOLDA enables us to easily analyze text streams or millions of documents. ER -
APA
Than, K. & Doan, T.. (2015). Dual online inference for latent Dirichlet allocation. Proceedings of the Sixth Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 39:80-95 Available from https://proceedings.mlr.press/v39/than14.html.

Related Material