Topic models conditioned on arbitrary features with Dirichlet-multinomial regression

David Mimno, Andrew McCallum
Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, PMLR R6:411-418, 2008.

Abstract

Although fully generative models have been successfully used to model the contents of text documents, they are often awkward to apply to combinations of text data and document metadata. In this paper we propose a Dirichlet-multinomial regression (DMR) topic model that includes a log-linear prior on document-topic distributions that is a function of observed features of the document, such as author, publication venue, references, and dates. We show that by selecting appropriate features, DMR topic models can meet or exceed the performance of several previously published topic models designed for specific data.

Cite this Paper


BibTeX
@InProceedings{pmlr-vR6-mimno08a, title = {Topic models conditioned on arbitrary features with Dirichlet-multinomial regression}, author = {Mimno, David and McCallum, Andrew}, booktitle = {Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence}, pages = {411--418}, year = {2008}, editor = {McAllester, David A. and Myllymäki, Petri}, volume = {R6}, series = {Proceedings of Machine Learning Research}, month = {09--12 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/r6/main/assets/mimno08a/mimno08a.pdf}, url = {https://proceedings.mlr.press/r6/mimno08a.html}, abstract = {Although fully generative models have been successfully used to model the contents of text documents, they are often awkward to apply to combinations of text data and document metadata. In this paper we propose a Dirichlet-multinomial regression (DMR) topic model that includes a log-linear prior on document-topic distributions that is a function of observed features of the document, such as author, publication venue, references, and dates. We show that by selecting appropriate features, DMR topic models can meet or exceed the performance of several previously published topic models designed for specific data.}, note = {Reissued by PMLR on 09 October 2024.} }
Endnote
%0 Conference Paper %T Topic models conditioned on arbitrary features with Dirichlet-multinomial regression %A David Mimno %A Andrew McCallum %B Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2008 %E David A. McAllester %E Petri Myllymäki %F pmlr-vR6-mimno08a %I PMLR %P 411--418 %U https://proceedings.mlr.press/r6/mimno08a.html %V R6 %X Although fully generative models have been successfully used to model the contents of text documents, they are often awkward to apply to combinations of text data and document metadata. In this paper we propose a Dirichlet-multinomial regression (DMR) topic model that includes a log-linear prior on document-topic distributions that is a function of observed features of the document, such as author, publication venue, references, and dates. We show that by selecting appropriate features, DMR topic models can meet or exceed the performance of several previously published topic models designed for specific data. %Z Reissued by PMLR on 09 October 2024.
APA
Mimno, D. & McCallum, A.. (2008). Topic models conditioned on arbitrary features with Dirichlet-multinomial regression. Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research R6:411-418 Available from https://proceedings.mlr.press/r6/mimno08a.html. Reissued by PMLR on 09 October 2024.

Related Material