On Modelling Non-linear Topical Dependencies

Zhixing Li; Siqiang Wen; Juanzi Li; Peng Zhang; Jie Tang

On Modelling Non-linear Topical Dependencies

Zhixing Li, Siqiang Wen, Juanzi Li, Peng Zhang, Jie Tang

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(1):458-466, 2014.

Abstract

Probabilistic topic models such as Latent Dirichlet Allocation (LDA) discover latent topics from large corpora by exploiting words’ co-occurring relation. By observing the topical similarity between words, we find that some other relations, such as semantic or syntax relation between words, lead to strong dependence between their topics. In this paper, sentences are represented as dependency trees and a Global Topic Random Field (GTRF) is presented to model the non-linear dependencies between words. To infer our model, a new global factor is defined over all edges and the normalization factor of GRF is proven to be a constant. As a result, no independent assumption is needed when inferring our model. Based on it, we develop an efficient expectation-maximization (EM) procedure for parameter estimation. Experimental results on four data sets show that GTRF achieves much lower perplexity than LDA and linear dependency topic models and produces better topic coherence.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-lib14,
  title = 	 {On Modelling Non-linear Topical Dependencies},
  author = 	 {Li, Zhixing and Wen, Siqiang and Li, Juanzi and Zhang, Peng and Tang, Jie},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {458--466},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {1},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/lib14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/lib14.html},
  abstract = 	 {Probabilistic topic models such as Latent Dirichlet Allocation (LDA) discover latent topics from large corpora by exploiting words’ co-occurring relation. By observing the topical similarity between words, we find that some other relations, such as semantic or syntax relation between words, lead to strong dependence between their topics. In this paper, sentences are represented as dependency trees and a Global Topic Random Field (GTRF) is presented to model the non-linear dependencies between words. To infer our model, a new global factor is defined over all edges and the normalization factor of GRF is proven to be a constant. As a result, no independent assumption is needed when inferring our model. Based on it, we develop an efficient expectation-maximization (EM) procedure for parameter estimation. Experimental results on four data sets show that GTRF achieves much lower perplexity than LDA and linear dependency topic models and produces better topic coherence.}
}

Endnote

%0 Conference Paper
%T On Modelling Non-linear Topical Dependencies
%A Zhixing Li
%A Siqiang Wen
%A Juanzi Li
%A Peng Zhang
%A Jie Tang
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-lib14
%I PMLR
%P 458--466
%U https://proceedings.mlr.press/v32/lib14.html
%V 32
%N 1
%X Probabilistic topic models such as Latent Dirichlet Allocation (LDA) discover latent topics from large corpora by exploiting words’ co-occurring relation. By observing the topical similarity between words, we find that some other relations, such as semantic or syntax relation between words, lead to strong dependence between their topics. In this paper, sentences are represented as dependency trees and a Global Topic Random Field (GTRF) is presented to model the non-linear dependencies between words. To infer our model, a new global factor is defined over all edges and the normalization factor of GRF is proven to be a constant. As a result, no independent assumption is needed when inferring our model. Based on it, we develop an efficient expectation-maximization (EM) procedure for parameter estimation. Experimental results on four data sets show that GTRF achieves much lower perplexity than LDA and linear dependency topic models and produces better topic coherence.

RIS


TY  - CPAPER
TI  - On Modelling Non-linear Topical Dependencies
AU  - Zhixing Li
AU  - Siqiang Wen
AU  - Juanzi Li
AU  - Peng Zhang
AU  - Jie Tang
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/01/27
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-lib14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 1
SP  - 458
EP  - 466
L1  - http://proceedings.mlr.press/v32/lib14.pdf
UR  - https://proceedings.mlr.press/v32/lib14.html
AB  - Probabilistic topic models such as Latent Dirichlet Allocation (LDA) discover latent topics from large corpora by exploiting words’ co-occurring relation. By observing the topical similarity between words, we find that some other relations, such as semantic or syntax relation between words, lead to strong dependence between their topics. In this paper, sentences are represented as dependency trees and a Global Topic Random Field (GTRF) is presented to model the non-linear dependencies between words. To infer our model, a new global factor is defined over all edges and the normalization factor of GRF is proven to be a constant. As a result, no independent assumption is needed when inferring our model. Based on it, we develop an efficient expectation-maximization (EM) procedure for parameter estimation. Experimental results on four data sets show that GTRF achieves much lower perplexity than LDA and linear dependency topic models and produces better topic coherence.
ER  -

APA


Li, Z., Wen, S., Li, J., Zhang, P. & Tang, J.. (2014). On Modelling Non-linear Topical Dependencies. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(1):458-466 Available from https://proceedings.mlr.press/v32/lib14.html.

Related Material

Download PDF