Minimum Volume Topic Modeling

Byoungwook Jang, Alfred Hero
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:3013-3021, 2019.

Abstract

We propose a new topic modeling procedure that takes advantage of the fact that the Latent Dirichlet Allocation (LDA) log-likelihood function is asymptotically equivalent to the logarithm of the volume of the topic simplex. This allows topic modeling to be reformulated as finding the probability simplex that minimizes its volume and encloses the documents that are represented as distributions over words. A convex relaxation of the minimum volume topic model optimization is proposed, and it is shown that the relaxed problem has the same global minimum as the original problem under the separability assumption and the sufficiently scattered assumption introduced by Arora et al. (2013) and Huang et al. (2016). A locally convergent alternating direction method of multipliers (ADMM) approach is introduced for solving the relaxed minimum volume problem. Numerical experiments illustrate the benefits of our approach in terms of computation time and topic recovery performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v89-jang19a, title = {Minimum Volume Topic Modeling}, author = {Jang, Byoungwook and Hero, Alfred}, booktitle = {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics}, pages = {3013--3021}, year = {2019}, editor = {Chaudhuri, Kamalika and Sugiyama, Masashi}, volume = {89}, series = {Proceedings of Machine Learning Research}, month = {16--18 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v89/jang19a/jang19a.pdf}, url = {https://proceedings.mlr.press/v89/jang19a.html}, abstract = {We propose a new topic modeling procedure that takes advantage of the fact that the Latent Dirichlet Allocation (LDA) log-likelihood function is asymptotically equivalent to the logarithm of the volume of the topic simplex. This allows topic modeling to be reformulated as finding the probability simplex that minimizes its volume and encloses the documents that are represented as distributions over words. A convex relaxation of the minimum volume topic model optimization is proposed, and it is shown that the relaxed problem has the same global minimum as the original problem under the separability assumption and the sufficiently scattered assumption introduced by Arora et al. (2013) and Huang et al. (2016). A locally convergent alternating direction method of multipliers (ADMM) approach is introduced for solving the relaxed minimum volume problem. Numerical experiments illustrate the benefits of our approach in terms of computation time and topic recovery performance.} }
Endnote
%0 Conference Paper %T Minimum Volume Topic Modeling %A Byoungwook Jang %A Alfred Hero %B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Masashi Sugiyama %F pmlr-v89-jang19a %I PMLR %P 3013--3021 %U https://proceedings.mlr.press/v89/jang19a.html %V 89 %X We propose a new topic modeling procedure that takes advantage of the fact that the Latent Dirichlet Allocation (LDA) log-likelihood function is asymptotically equivalent to the logarithm of the volume of the topic simplex. This allows topic modeling to be reformulated as finding the probability simplex that minimizes its volume and encloses the documents that are represented as distributions over words. A convex relaxation of the minimum volume topic model optimization is proposed, and it is shown that the relaxed problem has the same global minimum as the original problem under the separability assumption and the sufficiently scattered assumption introduced by Arora et al. (2013) and Huang et al. (2016). A locally convergent alternating direction method of multipliers (ADMM) approach is introduced for solving the relaxed minimum volume problem. Numerical experiments illustrate the benefits of our approach in terms of computation time and topic recovery performance.
APA
Jang, B. & Hero, A.. (2019). Minimum Volume Topic Modeling. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:3013-3021 Available from https://proceedings.mlr.press/v89/jang19a.html.

Related Material