Latent Gaussian Models for Topic Modeling

Changwei Hu, Eunsu Ryu, David Carlson, Yingjian Wang, Lawrence Carin
Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, PMLR 33:393-401, 2014.

Abstract

A new approach is proposed for topic modeling, in which the latent matrix factorization employs Gaussian priors, rather than the Dirichlet-class priors widely used in such models. The use of a latent-Gaussian model permits simple and efficient approximate Bayesian posterior inference, via the Laplace approximation. On multiple datasets, the proposed approach is demonstrated to yield results as accurate as state-of-the-art approaches based on Dirichlet constructions, at a small fraction of the computation. The framework is general enough to jointly model text and binary data, here demonstrated to produce accurate and fast results for joint analysis of voting rolls and the associated legislative text. Further, it is demonstrated how the technique may be scaled up to massive data, with encouraging performance relative to alternative methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v33-hu14, title = {{Latent Gaussian Models for Topic Modeling}}, author = {Hu, Changwei and Ryu, Eunsu and Carlson, David and Wang, Yingjian and Carin, Lawrence}, booktitle = {Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics}, pages = {393--401}, year = {2014}, editor = {Kaski, Samuel and Corander, Jukka}, volume = {33}, series = {Proceedings of Machine Learning Research}, address = {Reykjavik, Iceland}, month = {22--25 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v33/hu14.pdf}, url = {https://proceedings.mlr.press/v33/hu14.html}, abstract = {A new approach is proposed for topic modeling, in which the latent matrix factorization employs Gaussian priors, rather than the Dirichlet-class priors widely used in such models. The use of a latent-Gaussian model permits simple and efficient approximate Bayesian posterior inference, via the Laplace approximation. On multiple datasets, the proposed approach is demonstrated to yield results as accurate as state-of-the-art approaches based on Dirichlet constructions, at a small fraction of the computation. The framework is general enough to jointly model text and binary data, here demonstrated to produce accurate and fast results for joint analysis of voting rolls and the associated legislative text. Further, it is demonstrated how the technique may be scaled up to massive data, with encouraging performance relative to alternative methods.} }
Endnote
%0 Conference Paper %T Latent Gaussian Models for Topic Modeling %A Changwei Hu %A Eunsu Ryu %A David Carlson %A Yingjian Wang %A Lawrence Carin %B Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2014 %E Samuel Kaski %E Jukka Corander %F pmlr-v33-hu14 %I PMLR %P 393--401 %U https://proceedings.mlr.press/v33/hu14.html %V 33 %X A new approach is proposed for topic modeling, in which the latent matrix factorization employs Gaussian priors, rather than the Dirichlet-class priors widely used in such models. The use of a latent-Gaussian model permits simple and efficient approximate Bayesian posterior inference, via the Laplace approximation. On multiple datasets, the proposed approach is demonstrated to yield results as accurate as state-of-the-art approaches based on Dirichlet constructions, at a small fraction of the computation. The framework is general enough to jointly model text and binary data, here demonstrated to produce accurate and fast results for joint analysis of voting rolls and the associated legislative text. Further, it is demonstrated how the technique may be scaled up to massive data, with encouraging performance relative to alternative methods.
RIS
TY - CPAPER TI - Latent Gaussian Models for Topic Modeling AU - Changwei Hu AU - Eunsu Ryu AU - David Carlson AU - Yingjian Wang AU - Lawrence Carin BT - Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics DA - 2014/04/02 ED - Samuel Kaski ED - Jukka Corander ID - pmlr-v33-hu14 PB - PMLR DP - Proceedings of Machine Learning Research VL - 33 SP - 393 EP - 401 L1 - http://proceedings.mlr.press/v33/hu14.pdf UR - https://proceedings.mlr.press/v33/hu14.html AB - A new approach is proposed for topic modeling, in which the latent matrix factorization employs Gaussian priors, rather than the Dirichlet-class priors widely used in such models. The use of a latent-Gaussian model permits simple and efficient approximate Bayesian posterior inference, via the Laplace approximation. On multiple datasets, the proposed approach is demonstrated to yield results as accurate as state-of-the-art approaches based on Dirichlet constructions, at a small fraction of the computation. The framework is general enough to jointly model text and binary data, here demonstrated to produce accurate and fast results for joint analysis of voting rolls and the associated legislative text. Further, it is demonstrated how the technique may be scaled up to massive data, with encouraging performance relative to alternative methods. ER -
APA
Hu, C., Ryu, E., Carlson, D., Wang, Y. & Carin, L.. (2014). Latent Gaussian Models for Topic Modeling. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 33:393-401 Available from https://proceedings.mlr.press/v33/hu14.html.

Related Material