Hidden Topic Markov Models

Amit Gruber; Yair Weiss; Michal Rosen-Zvi

Hidden Topic Markov Models

Amit Gruber, Yair Weiss, Michal Rosen-Zvi

Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, PMLR 2:163-170, 2007.

Abstract

Algorithms such as Latent Dirichlet Allocation (LDA) have achieved significant progress in modeling word document relationships. These algorithms assume each word in the document was generated by a hidden topic and explicitly model the word distribution of each topic as well as the prior distribution over topics in the document. Given these parameters, the topics of all words in the same document are assumed to be independent. In this paper, we propose modeling the topics of words in the document as a Markov chain. Specifically, we assume that all words in the same sentence have the same topic, and successive sentences are more likely to have the same topics. Since the topics are hidden, this leads to using the well-known tools of Hidden Markov Models for learning and inference. We show that incorporating this dependency allows us to learn better topics and to disambiguate words that can belong to different topics. Quantitatively, we show that we obtain better perplexity in modeling documents with only a modest increase in learning and inference complexity.

Cite this Paper

BibTeX


@InProceedings{pmlr-v2-gruber07a,
  title = 	 {Hidden Topic Markov Models},
  author = 	 {Gruber, Amit and Weiss, Yair and Rosen-Zvi, Michal},
  booktitle = 	 {Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics},
  pages = 	 {163--170},
  year = 	 {2007},
  editor = 	 {Meila, Marina and Shen, Xiaotong},
  volume = 	 {2},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {San Juan, Puerto Rico},
  month = 	 {21--24 Mar},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v2/gruber07a/gruber07a.pdf},
  url = 	 {https://proceedings.mlr.press/v2/gruber07a.html},
  abstract = 	 {Algorithms such as Latent Dirichlet Allocation (LDA) have achieved significant progress in modeling word document relationships. These algorithms assume each word in the document was generated by a hidden topic and explicitly model the word distribution of each topic as well as the prior distribution over topics in the document. Given these parameters, the topics of all words in the same document are assumed to be independent. In this paper, we propose modeling the topics of words in the document as a Markov chain. Specifically, we assume that all words in the same sentence have the same topic, and successive sentences are more likely to have the same topics. Since the topics are hidden, this leads to using the well-known tools of Hidden Markov Models for learning and inference. We show that incorporating this dependency allows us to learn better topics and to disambiguate words that can belong to different topics. Quantitatively, we show that we obtain better perplexity in modeling documents with only a modest increase in learning and inference complexity.}
}

Endnote

%0 Conference Paper
%T Hidden Topic Markov Models
%A Amit Gruber
%A Yair Weiss
%A Michal Rosen-Zvi
%B Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2007
%E Marina Meila
%E Xiaotong Shen	
%F pmlr-v2-gruber07a
%I PMLR
%P 163--170
%U https://proceedings.mlr.press/v2/gruber07a.html
%V 2
%X Algorithms such as Latent Dirichlet Allocation (LDA) have achieved significant progress in modeling word document relationships. These algorithms assume each word in the document was generated by a hidden topic and explicitly model the word distribution of each topic as well as the prior distribution over topics in the document. Given these parameters, the topics of all words in the same document are assumed to be independent. In this paper, we propose modeling the topics of words in the document as a Markov chain. Specifically, we assume that all words in the same sentence have the same topic, and successive sentences are more likely to have the same topics. Since the topics are hidden, this leads to using the well-known tools of Hidden Markov Models for learning and inference. We show that incorporating this dependency allows us to learn better topics and to disambiguate words that can belong to different topics. Quantitatively, we show that we obtain better perplexity in modeling documents with only a modest increase in learning and inference complexity.

RIS


TY  - CPAPER
TI  - Hidden Topic Markov Models
AU  - Amit Gruber
AU  - Yair Weiss
AU  - Michal Rosen-Zvi
BT  - Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics
DA  - 2007/03/11
ED  - Marina Meila
ED  - Xiaotong Shen	
ID  - pmlr-v2-gruber07a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 2
SP  - 163
EP  - 170
L1  - http://proceedings.mlr.press/v2/gruber07a/gruber07a.pdf
UR  - https://proceedings.mlr.press/v2/gruber07a.html
AB  - Algorithms such as Latent Dirichlet Allocation (LDA) have achieved significant progress in modeling word document relationships. These algorithms assume each word in the document was generated by a hidden topic and explicitly model the word distribution of each topic as well as the prior distribution over topics in the document. Given these parameters, the topics of all words in the same document are assumed to be independent. In this paper, we propose modeling the topics of words in the document as a Markov chain. Specifically, we assume that all words in the same sentence have the same topic, and successive sentences are more likely to have the same topics. Since the topics are hidden, this leads to using the well-known tools of Hidden Markov Models for learning and inference. We show that incorporating this dependency allows us to learn better topics and to disambiguate words that can belong to different topics. Quantitatively, we show that we obtain better perplexity in modeling documents with only a modest increase in learning and inference complexity.
ER  -

APA


Gruber, A., Weiss, Y. & Rosen-Zvi, M.. (2007). Hidden Topic Markov Models. Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 2:163-170 Available from https://proceedings.mlr.press/v2/gruber07a.html.

Related Material

Download PDF