Model Selection for Topic Models via Spectral Decomposition

Dehua Cheng; Xinran He; Yan Liu

Model Selection for Topic Models via Spectral Decomposition

Dehua Cheng, Xinran He, Yan Liu

Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, PMLR 38:183-191, 2015.

Abstract

Topic models have achieved significant successes in analyzing large-scale text corpus. In practical applications, we are always confronted with the challenge of model selection, i.e., how to appropriately set the number of topics. Following the recent advances in topic models via tensor decomposition, we make a first attempt to provide theoretical analysis on model selection in latent Dirichlet allocation. With mild conditions, we derive the upper bound and lower bound on the number of topics given a text collection of finite size. Experimental results demonstrate that our bounds are correct and tight. Furthermore, using Gaussian mixture model as an example, we show that our methodology can be easily generalized to model selection analysis in other latent models.

Cite this Paper

BibTeX


@InProceedings{pmlr-v38-cheng15,
  title = 	 {{Model Selection for Topic Models via Spectral Decomposition}},
  author = 	 {Cheng, Dehua and He, Xinran and Liu, Yan},
  booktitle = 	 {Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {183--191},
  year = 	 {2015},
  editor = 	 {Lebanon, Guy and Vishwanathan, S. V. N.},
  volume = 	 {38},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {San Diego, California, USA},
  month = 	 {09--12 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v38/cheng15.pdf},
  url = 	 {https://proceedings.mlr.press/v38/cheng15.html},
  abstract = 	 {Topic models have achieved significant successes in analyzing large-scale text corpus. In practical applications, we are always confronted with the challenge of model selection, i.e., how to appropriately set the number of topics.  Following the recent advances in topic models  via tensor decomposition, we make a first attempt to provide theoretical analysis on model selection in latent Dirichlet allocation. With mild conditions, we derive the upper bound and lower bound on the number of topics given a text collection of finite size. Experimental results demonstrate that our bounds are correct and tight. Furthermore, using Gaussian mixture model as an example, we show that our methodology can be easily generalized to model selection analysis in other latent models.}
}

Endnote

%0 Conference Paper
%T Model Selection for Topic Models via Spectral Decomposition
%A Dehua Cheng
%A Xinran He
%A Yan Liu
%B Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2015
%E Guy Lebanon
%E S. V. N. Vishwanathan	
%F pmlr-v38-cheng15
%I PMLR
%P 183--191
%U https://proceedings.mlr.press/v38/cheng15.html
%V 38
%X Topic models have achieved significant successes in analyzing large-scale text corpus. In practical applications, we are always confronted with the challenge of model selection, i.e., how to appropriately set the number of topics.  Following the recent advances in topic models  via tensor decomposition, we make a first attempt to provide theoretical analysis on model selection in latent Dirichlet allocation. With mild conditions, we derive the upper bound and lower bound on the number of topics given a text collection of finite size. Experimental results demonstrate that our bounds are correct and tight. Furthermore, using Gaussian mixture model as an example, we show that our methodology can be easily generalized to model selection analysis in other latent models.

RIS


TY  - CPAPER
TI  - Model Selection for Topic Models via Spectral Decomposition
AU  - Dehua Cheng
AU  - Xinran He
AU  - Yan Liu
BT  - Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics
DA  - 2015/02/21
ED  - Guy Lebanon
ED  - S. V. N. Vishwanathan	
ID  - pmlr-v38-cheng15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 38
SP  - 183
EP  - 191
L1  - http://proceedings.mlr.press/v38/cheng15.pdf
UR  - https://proceedings.mlr.press/v38/cheng15.html
AB  - Topic models have achieved significant successes in analyzing large-scale text corpus. In practical applications, we are always confronted with the challenge of model selection, i.e., how to appropriately set the number of topics.  Following the recent advances in topic models  via tensor decomposition, we make a first attempt to provide theoretical analysis on model selection in latent Dirichlet allocation. With mild conditions, we derive the upper bound and lower bound on the number of topics given a text collection of finite size. Experimental results demonstrate that our bounds are correct and tight. Furthermore, using Gaussian mixture model as an example, we show that our methodology can be easily generalized to model selection analysis in other latent models.
ER  -

APA


Cheng, D., He, X. & Liu, Y.. (2015). Model Selection for Topic Models via Spectral Decomposition. Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 38:183-191 Available from https://proceedings.mlr.press/v38/cheng15.html.

Model Selection for Topic Models via Spectral Decomposition

Abstract

Cite this Paper

Related Material