Latent topic models for hypertext

Amit Gruber, Michal Rosen-Zvi, Yair Weiss
Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, PMLR R6:230-239, 2008.

Abstract

Latent topic models have been successfully applied as an unsupervised topic discovery technique in large document collections. With the proliferation of hypertext document collection such as the Internet, there has also been great interest in extending these approaches to hypertext [6, 9]. These approaches typically model links in an analogous fashion to how they model words - the document-link co-occurrence matrix is modeled in the same way that the document-word co-occurrence matrix is modeled in standard topic models.In this paper we present a probabilistic generative model for hypertext document collections that explicitly models the generation of links. Specifically, links from a word w to a document d depend directly on how frequent the topic of w is in d, in addition to the in-degree of d. We show how to perform EM learning on this model efficiently. By not modeling links as analogous to words, we end up using far fewer free parameters and obtain better link prediction results.

Cite this Paper


BibTeX
@InProceedings{pmlr-vR6-gruber08a, title = {Latent topic models for hypertext}, author = {Gruber, Amit and Rosen-Zvi, Michal and Weiss, Yair}, booktitle = {Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence}, pages = {230--239}, year = {2008}, editor = {McAllester, David A. and Myllymäki, Petri}, volume = {R6}, series = {Proceedings of Machine Learning Research}, month = {09--12 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/r6/main/assets/gruber08a/gruber08a.pdf}, url = {https://proceedings.mlr.press/r6/gruber08a.html}, abstract = {Latent topic models have been successfully applied as an unsupervised topic discovery technique in large document collections. With the proliferation of hypertext document collection such as the Internet, there has also been great interest in extending these approaches to hypertext [6, 9]. These approaches typically model links in an analogous fashion to how they model words - the document-link co-occurrence matrix is modeled in the same way that the document-word co-occurrence matrix is modeled in standard topic models.In this paper we present a probabilistic generative model for hypertext document collections that explicitly models the generation of links. Specifically, links from a word w to a document d depend directly on how frequent the topic of w is in d, in addition to the in-degree of d. We show how to perform EM learning on this model efficiently. By not modeling links as analogous to words, we end up using far fewer free parameters and obtain better link prediction results.}, note = {Reissued by PMLR on 09 October 2024.} }
Endnote
%0 Conference Paper %T Latent topic models for hypertext %A Amit Gruber %A Michal Rosen-Zvi %A Yair Weiss %B Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2008 %E David A. McAllester %E Petri Myllymäki %F pmlr-vR6-gruber08a %I PMLR %P 230--239 %U https://proceedings.mlr.press/r6/gruber08a.html %V R6 %X Latent topic models have been successfully applied as an unsupervised topic discovery technique in large document collections. With the proliferation of hypertext document collection such as the Internet, there has also been great interest in extending these approaches to hypertext [6, 9]. These approaches typically model links in an analogous fashion to how they model words - the document-link co-occurrence matrix is modeled in the same way that the document-word co-occurrence matrix is modeled in standard topic models.In this paper we present a probabilistic generative model for hypertext document collections that explicitly models the generation of links. Specifically, links from a word w to a document d depend directly on how frequent the topic of w is in d, in addition to the in-degree of d. We show how to perform EM learning on this model efficiently. By not modeling links as analogous to words, we end up using far fewer free parameters and obtain better link prediction results. %Z Reissued by PMLR on 09 October 2024.
APA
Gruber, A., Rosen-Zvi, M. & Weiss, Y.. (2008). Latent topic models for hypertext. Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research R6:230-239 Available from https://proceedings.mlr.press/r6/gruber08a.html. Reissued by PMLR on 09 October 2024.

Related Material