Dual online inference for latent Dirichlet allocation

Khoat Than; Tung Doan

Dual online inference for latent Dirichlet allocation

Khoat Than, Tung Doan

Proceedings of the Sixth Asian Conference on Machine Learning, PMLR 39:80-95, 2015.

Abstract

Latent Dirichlet allocation (LDA) provides an efficient tool to analyze very large text collections. In this paper, we discuss three novel contributions: (1) a proof for the tractability of the MAP estimation of topic mixtures under certain conditions that might fit well with practices, even though the problem is known to be intractable in the worst case; (2) a provably fast algorithm (OFW) for inferring topic mixtures; (3) a dual online algorithm (DOLDA) for learning LDA at a large scale. We show that OFW converges to some local optima, but under certain conditions it can converge to global optima. The discussion of OFW is very general and hence can be readily employed to accelerate the MAP estimation in a wide class of probabilistic models. From extensive experiments we find that DOLDA can achieve significantly better predictive performance and more interpretable topics, with lower runtime, than stochastic variational inference. Further, DOLDA enables us to easily analyze text streams or millions of documents.

Cite this Paper

BibTeX


@InProceedings{pmlr-v39-than14,
  title = 	 {Dual online inference for latent {D}irichlet allocation},
  author = 	 {Than, Khoat and Doan, Tung},
  booktitle = 	 {Proceedings of the Sixth Asian Conference on Machine Learning},
  pages = 	 {80--95},
  year = 	 {2015},
  editor = 	 {Phung, Dinh and Li, Hang},
  volume = 	 {39},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Nha Trang City, Vietnam},
  month = 	 {26--28 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v39/than14.pdf},
  url = 	 {https://proceedings.mlr.press/v39/than14.html},
  abstract = 	 {Latent Dirichlet allocation (LDA) provides an efficient tool to analyze very large text collections. In this paper, we discuss three novel contributions: (1) a proof for the tractability of the MAP estimation of topic mixtures under certain conditions that might fit well with practices, even though the problem is known to be intractable in the worst case; (2) a provably fast algorithm (OFW) for inferring topic mixtures; (3) a dual online algorithm (DOLDA) for learning LDA at a large scale. We show that OFW converges to some local optima, but under certain conditions it can converge to global optima. The discussion of OFW is very general and hence can be readily employed to accelerate the MAP estimation in a wide class of probabilistic models. From extensive experiments we find that DOLDA can achieve significantly better predictive performance and more interpretable topics, with lower runtime, than stochastic variational inference. Further, DOLDA enables us to easily analyze text streams or millions of documents.}
}

Endnote

%0 Conference Paper
%T Dual online inference for latent Dirichlet allocation
%A Khoat Than
%A Tung Doan
%B Proceedings of the Sixth Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Dinh Phung
%E Hang Li	
%F pmlr-v39-than14
%I PMLR
%P 80--95
%U https://proceedings.mlr.press/v39/than14.html
%V 39
%X Latent Dirichlet allocation (LDA) provides an efficient tool to analyze very large text collections. In this paper, we discuss three novel contributions: (1) a proof for the tractability of the MAP estimation of topic mixtures under certain conditions that might fit well with practices, even though the problem is known to be intractable in the worst case; (2) a provably fast algorithm (OFW) for inferring topic mixtures; (3) a dual online algorithm (DOLDA) for learning LDA at a large scale. We show that OFW converges to some local optima, but under certain conditions it can converge to global optima. The discussion of OFW is very general and hence can be readily employed to accelerate the MAP estimation in a wide class of probabilistic models. From extensive experiments we find that DOLDA can achieve significantly better predictive performance and more interpretable topics, with lower runtime, than stochastic variational inference. Further, DOLDA enables us to easily analyze text streams or millions of documents.

RIS


TY  - CPAPER
TI  - Dual online inference for latent Dirichlet allocation
AU  - Khoat Than
AU  - Tung Doan
BT  - Proceedings of the Sixth Asian Conference on Machine Learning
DA  - 2015/02/16
ED  - Dinh Phung
ED  - Hang Li	
ID  - pmlr-v39-than14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 39
SP  - 80
EP  - 95
L1  - http://proceedings.mlr.press/v39/than14.pdf
UR  - https://proceedings.mlr.press/v39/than14.html
AB  - Latent Dirichlet allocation (LDA) provides an efficient tool to analyze very large text collections. In this paper, we discuss three novel contributions: (1) a proof for the tractability of the MAP estimation of topic mixtures under certain conditions that might fit well with practices, even though the problem is known to be intractable in the worst case; (2) a provably fast algorithm (OFW) for inferring topic mixtures; (3) a dual online algorithm (DOLDA) for learning LDA at a large scale. We show that OFW converges to some local optima, but under certain conditions it can converge to global optima. The discussion of OFW is very general and hence can be readily employed to accelerate the MAP estimation in a wide class of probabilistic models. From extensive experiments we find that DOLDA can achieve significantly better predictive performance and more interpretable topics, with lower runtime, than stochastic variational inference. Further, DOLDA enables us to easily analyze text streams or millions of documents.
ER  -

APA


Than, K. & Doan, T.. (2015). Dual online inference for latent Dirichlet allocation. Proceedings of the Sixth Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 39:80-95 Available from https://proceedings.mlr.press/v39/than14.html.

Related Material

Download PDF