On Estimation and Selection for Topic Models

Matt Taddy

On Estimation and Selection for Topic Models

Matt Taddy

Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, PMLR 22:1184-1193, 2012.

Abstract

This article describes posterior maximization for topic models, identifying computational and conceptual gains from inference under a non-standard parametrization. We then show that fitted parameters can be used as the basis for a novel approach to marginal likelihood estimation, via block-diagonal approximation to the information matrix, that facilitates choosing the number of latent topics. This likelihood-based model selection is complemented with a goodness-of-fit analysis built around estimated residual dispersion. Examples are provided to illustrate model selection as well as to compare our estimation against standard alternative techniques.

Cite this Paper

BibTeX


@InProceedings{pmlr-v22-taddy12,
  title = 	 {On Estimation and Selection for Topic Models},
  author = 	 {Taddy, Matt},
  booktitle = 	 {Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {1184--1193},
  year = 	 {2012},
  editor = 	 {Lawrence, Neil D. and Girolami, Mark},
  volume = 	 {22},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {La Palma, Canary Islands},
  month = 	 {21--23 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v22/taddy12/taddy12.pdf},
  url = 	 {https://proceedings.mlr.press/v22/taddy12.html},
  abstract = 	 {This article describes posterior maximization for topic models, identifying computational and   conceptual gains from inference under a non-standard    parametrization.  We then show that fitted parameters can be used  as the basis for a novel approach to marginal likelihood estimation,   via block-diagonal approximation to the information matrix, that facilitates choosing the number of latent topics.  This   likelihood-based model selection is complemented with a goodness-of-fit analysis built around estimated residual dispersion.  Examples are provided to illustrate model selection as well as to compare our estimation against standard alternative techniques.}
}

Endnote

%0 Conference Paper
%T On Estimation and Selection for Topic Models
%A Matt Taddy
%B Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2012
%E Neil D. Lawrence
%E Mark Girolami	
%F pmlr-v22-taddy12
%I PMLR
%P 1184--1193
%U https://proceedings.mlr.press/v22/taddy12.html
%V 22
%X This article describes posterior maximization for topic models, identifying computational and   conceptual gains from inference under a non-standard    parametrization.  We then show that fitted parameters can be used  as the basis for a novel approach to marginal likelihood estimation,   via block-diagonal approximation to the information matrix, that facilitates choosing the number of latent topics.  This   likelihood-based model selection is complemented with a goodness-of-fit analysis built around estimated residual dispersion.  Examples are provided to illustrate model selection as well as to compare our estimation against standard alternative techniques.

RIS


TY  - CPAPER
TI  - On Estimation and Selection for Topic Models
AU  - Matt Taddy
BT  - Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics
DA  - 2012/03/21
ED  - Neil D. Lawrence
ED  - Mark Girolami	
ID  - pmlr-v22-taddy12
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 22
SP  - 1184
EP  - 1193
L1  - http://proceedings.mlr.press/v22/taddy12/taddy12.pdf
UR  - https://proceedings.mlr.press/v22/taddy12.html
AB  - This article describes posterior maximization for topic models, identifying computational and   conceptual gains from inference under a non-standard    parametrization.  We then show that fitted parameters can be used  as the basis for a novel approach to marginal likelihood estimation,   via block-diagonal approximation to the information matrix, that facilitates choosing the number of latent topics.  This   likelihood-based model selection is complemented with a goodness-of-fit analysis built around estimated residual dispersion.  Examples are provided to illustrate model selection as well as to compare our estimation against standard alternative techniques.
ER  -

APA


Taddy, M.. (2012). On Estimation and Selection for Topic Models. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 22:1184-1193 Available from https://proceedings.mlr.press/v22/taddy12.html.

Related Material

Download PDF