A recursive estimate for the predictive likelihood in a topic model

James Scott, Jason Baldridge
Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, PMLR 31:527-535, 2013.

Abstract

We consider the problem of evaluating the predictive log likelihood of a previously un- seen document under a topic model. This task arises when cross-validating for a model hyperparameter, when testing a model on a hold-out set, and when comparing the performance of different fitting strategies. Yet it is known to be very challenging, as it is equivalent to estimating a marginal likelihood in Bayesian model selection. We propose a fast algorithm for approximating this likelihood, one whose computational cost is linear both in document length and in the number of topics. The method is a first-order approximation to the algorithm of Carvalho et al. (2010), and can also be interpreted as a one-particle, Rao-Blackwellized version of the "left-to-right" method of Wallach et al. (2009). On our test examples, the proposed method gives similar answers to these other methods, but at lower computational cost.

Cite this Paper


BibTeX
@InProceedings{pmlr-v31-scott13a, title = {A recursive estimate for the predictive likelihood in a topic model}, author = {Scott, James and Baldridge, Jason}, booktitle = {Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics}, pages = {527--535}, year = {2013}, editor = {Carvalho, Carlos M. and Ravikumar, Pradeep}, volume = {31}, series = {Proceedings of Machine Learning Research}, address = {Scottsdale, Arizona, USA}, month = {29 Apr--01 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v31/scott13a.pdf}, url = {http://proceedings.mlr.press/v31/scott13a.html}, abstract = {We consider the problem of evaluating the predictive log likelihood of a previously un- seen document under a topic model. This task arises when cross-validating for a model hyperparameter, when testing a model on a hold-out set, and when comparing the performance of different fitting strategies. Yet it is known to be very challenging, as it is equivalent to estimating a marginal likelihood in Bayesian model selection. We propose a fast algorithm for approximating this likelihood, one whose computational cost is linear both in document length and in the number of topics. The method is a first-order approximation to the algorithm of Carvalho et al. (2010), and can also be interpreted as a one-particle, Rao-Blackwellized version of the "left-to-right" method of Wallach et al. (2009). On our test examples, the proposed method gives similar answers to these other methods, but at lower computational cost.} }
Endnote
%0 Conference Paper %T A recursive estimate for the predictive likelihood in a topic model %A James Scott %A Jason Baldridge %B Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2013 %E Carlos M. Carvalho %E Pradeep Ravikumar %F pmlr-v31-scott13a %I PMLR %P 527--535 %U http://proceedings.mlr.press/v31/scott13a.html %V 31 %X We consider the problem of evaluating the predictive log likelihood of a previously un- seen document under a topic model. This task arises when cross-validating for a model hyperparameter, when testing a model on a hold-out set, and when comparing the performance of different fitting strategies. Yet it is known to be very challenging, as it is equivalent to estimating a marginal likelihood in Bayesian model selection. We propose a fast algorithm for approximating this likelihood, one whose computational cost is linear both in document length and in the number of topics. The method is a first-order approximation to the algorithm of Carvalho et al. (2010), and can also be interpreted as a one-particle, Rao-Blackwellized version of the "left-to-right" method of Wallach et al. (2009). On our test examples, the proposed method gives similar answers to these other methods, but at lower computational cost.
RIS
TY - CPAPER TI - A recursive estimate for the predictive likelihood in a topic model AU - James Scott AU - Jason Baldridge BT - Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics DA - 2013/04/29 ED - Carlos M. Carvalho ED - Pradeep Ravikumar ID - pmlr-v31-scott13a PB - PMLR DP - Proceedings of Machine Learning Research VL - 31 SP - 527 EP - 535 L1 - http://proceedings.mlr.press/v31/scott13a.pdf UR - http://proceedings.mlr.press/v31/scott13a.html AB - We consider the problem of evaluating the predictive log likelihood of a previously un- seen document under a topic model. This task arises when cross-validating for a model hyperparameter, when testing a model on a hold-out set, and when comparing the performance of different fitting strategies. Yet it is known to be very challenging, as it is equivalent to estimating a marginal likelihood in Bayesian model selection. We propose a fast algorithm for approximating this likelihood, one whose computational cost is linear both in document length and in the number of topics. The method is a first-order approximation to the algorithm of Carvalho et al. (2010), and can also be interpreted as a one-particle, Rao-Blackwellized version of the "left-to-right" method of Wallach et al. (2009). On our test examples, the proposed method gives similar answers to these other methods, but at lower computational cost. ER -
APA
Scott, J. & Baldridge, J.. (2013). A recursive estimate for the predictive likelihood in a topic model. Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 31:527-535 Available from http://proceedings.mlr.press/v31/scott13a.html.

Related Material