Latent Gaussian Models for Topic Modeling

Changwei Hu; Eunsu Ryu; David Carlson; Yingjian Wang; Lawrence Carin

Latent Gaussian Models for Topic Modeling

Changwei Hu, Eunsu Ryu, David Carlson, Yingjian Wang, Lawrence Carin

Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, PMLR 33:393-401, 2014.

Abstract

A new approach is proposed for topic modeling, in which the latent matrix factorization employs Gaussian priors, rather than the Dirichlet-class priors widely used in such models. The use of a latent-Gaussian model permits simple and efficient approximate Bayesian posterior inference, via the Laplace approximation. On multiple datasets, the proposed approach is demonstrated to yield results as accurate as state-of-the-art approaches based on Dirichlet constructions, at a small fraction of the computation. The framework is general enough to jointly model text and binary data, here demonstrated to produce accurate and fast results for joint analysis of voting rolls and the associated legislative text. Further, it is demonstrated how the technique may be scaled up to massive data, with encouraging performance relative to alternative methods.

Cite this Paper

BibTeX


@InProceedings{pmlr-v33-hu14,
  title = 	 {{Latent Gaussian Models for Topic Modeling}},
  author = 	 {Hu, Changwei and Ryu, Eunsu and Carlson, David and Wang, Yingjian and Carin, Lawrence},
  booktitle = 	 {Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {393--401},
  year = 	 {2014},
  editor = 	 {Kaski, Samuel and Corander, Jukka},
  volume = 	 {33},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Reykjavik, Iceland},
  month = 	 {22--25 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v33/hu14.pdf},
  url = 	 {https://proceedings.mlr.press/v33/hu14.html},
  abstract = 	 {A new approach is proposed for topic modeling, in which the latent matrix factorization employs Gaussian priors, rather than the Dirichlet-class priors widely used in such models. The use of a latent-Gaussian model permits simple and efficient approximate Bayesian posterior inference, via the Laplace approximation. On multiple datasets, the proposed approach is demonstrated to yield results as accurate as state-of-the-art approaches based on Dirichlet constructions, at a small fraction of the computation. The framework is general enough to jointly model text and binary data, here demonstrated to produce accurate and fast results for joint analysis of voting rolls and the associated legislative text. Further, it is demonstrated how the technique may be scaled up to massive data, with encouraging performance relative to alternative methods.}
}

Endnote

%0 Conference Paper
%T Latent Gaussian Models for Topic Modeling
%A Changwei Hu
%A Eunsu Ryu
%A David Carlson
%A Yingjian Wang
%A Lawrence Carin
%B Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2014
%E Samuel Kaski
%E Jukka Corander	
%F pmlr-v33-hu14
%I PMLR
%P 393--401
%U https://proceedings.mlr.press/v33/hu14.html
%V 33
%X A new approach is proposed for topic modeling, in which the latent matrix factorization employs Gaussian priors, rather than the Dirichlet-class priors widely used in such models. The use of a latent-Gaussian model permits simple and efficient approximate Bayesian posterior inference, via the Laplace approximation. On multiple datasets, the proposed approach is demonstrated to yield results as accurate as state-of-the-art approaches based on Dirichlet constructions, at a small fraction of the computation. The framework is general enough to jointly model text and binary data, here demonstrated to produce accurate and fast results for joint analysis of voting rolls and the associated legislative text. Further, it is demonstrated how the technique may be scaled up to massive data, with encouraging performance relative to alternative methods.

RIS


TY  - CPAPER
TI  - Latent Gaussian Models for Topic Modeling
AU  - Changwei Hu
AU  - Eunsu Ryu
AU  - David Carlson
AU  - Yingjian Wang
AU  - Lawrence Carin
BT  - Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics
DA  - 2014/04/02
ED  - Samuel Kaski
ED  - Jukka Corander	
ID  - pmlr-v33-hu14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 33
SP  - 393
EP  - 401
L1  - http://proceedings.mlr.press/v33/hu14.pdf
UR  - https://proceedings.mlr.press/v33/hu14.html
AB  - A new approach is proposed for topic modeling, in which the latent matrix factorization employs Gaussian priors, rather than the Dirichlet-class priors widely used in such models. The use of a latent-Gaussian model permits simple and efficient approximate Bayesian posterior inference, via the Laplace approximation. On multiple datasets, the proposed approach is demonstrated to yield results as accurate as state-of-the-art approaches based on Dirichlet constructions, at a small fraction of the computation. The framework is general enough to jointly model text and binary data, here demonstrated to produce accurate and fast results for joint analysis of voting rolls and the associated legislative text. Further, it is demonstrated how the technique may be scaled up to massive data, with encouraging performance relative to alternative methods.
ER  -

APA


Hu, C., Ryu, E., Carlson, D., Wang, Y. & Carin, L.. (2014). Latent Gaussian Models for Topic Modeling. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 33:393-401 Available from https://proceedings.mlr.press/v33/hu14.html.

Related Material

Download PDF