On Estimation and Selection for Topic Models


Matt Taddy ;
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, PMLR 22:1184-1193, 2012.


This article describes posterior maximization for topic models, identifying computational and conceptual gains from inference under a non-standard parametrization. We then show that fitted parameters can be used as the basis for a novel approach to marginal likelihood estimation, via block-diagonal approximation to the information matrix, that facilitates choosing the number of latent topics. This likelihood-based model selection is complemented with a goodness-of-fit analysis built around estimated residual dispersion. Examples are provided to illustrate model selection as well as to compare our estimation against standard alternative techniques.

Related Material