A recursive estimate for the predictive likelihood in a topic model


James Scott, Jason Baldridge ;
Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, PMLR 31:527-535, 2013.


We consider the problem of evaluating the predictive log likelihood of a previously un- seen document under a topic model. This task arises when cross-validating for a model hyperparameter, when testing a model on a hold-out set, and when comparing the performance of different fitting strategies. Yet it is known to be very challenging, as it is equivalent to estimating a marginal likelihood in Bayesian model selection. We propose a fast algorithm for approximating this likelihood, one whose computational cost is linear both in document length and in the number of topics. The method is a first-order approximation to the algorithm of Carvalho et al. (2010), and can also be interpreted as a one-particle, Rao-Blackwellized version of the "left-to-right" method of Wallach et al. (2009). On our test examples, the proposed method gives similar answers to these other methods, but at lower computational cost.

Related Material