On Modelling Non-linear Topical Dependencies

Zhixing Li, Siqiang Wen, Juanzi Li, Peng Zhang, Jie Tang
; Proceedings of the 31st International Conference on Machine Learning, PMLR 32(1):458-466, 2014.

Abstract

Probabilistic topic models such as Latent Dirichlet Allocation (LDA) discover latent topics from large corpora by exploiting words’ co-occurring relation. By observing the topical similarity between words, we find that some other relations, such as semantic or syntax relation between words, lead to strong dependence between their topics. In this paper, sentences are represented as dependency trees and a Global Topic Random Field (GTRF) is presented to model the non-linear dependencies between words. To infer our model, a new global factor is defined over all edges and the normalization factor of GRF is proven to be a constant. As a result, no independent assumption is needed when inferring our model. Based on it, we develop an efficient expectation-maximization (EM) procedure for parameter estimation. Experimental results on four data sets show that GTRF achieves much lower perplexity than LDA and linear dependency topic models and produces better topic coherence.

Cite this Paper


BibTeX
@InProceedings{pmlr-v32-lib14, title = {On Modelling Non-linear Topical Dependencies}, author = {Zhixing Li and Siqiang Wen and Juanzi Li and Peng Zhang and Jie Tang}, booktitle = {Proceedings of the 31st International Conference on Machine Learning}, pages = {458--466}, year = {2014}, editor = {Eric P. Xing and Tony Jebara}, volume = {32}, number = {1}, series = {Proceedings of Machine Learning Research}, address = {Bejing, China}, month = {22--24 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v32/lib14.pdf}, url = {http://proceedings.mlr.press/v32/lib14.html}, abstract = {Probabilistic topic models such as Latent Dirichlet Allocation (LDA) discover latent topics from large corpora by exploiting words’ co-occurring relation. By observing the topical similarity between words, we find that some other relations, such as semantic or syntax relation between words, lead to strong dependence between their topics. In this paper, sentences are represented as dependency trees and a Global Topic Random Field (GTRF) is presented to model the non-linear dependencies between words. To infer our model, a new global factor is defined over all edges and the normalization factor of GRF is proven to be a constant. As a result, no independent assumption is needed when inferring our model. Based on it, we develop an efficient expectation-maximization (EM) procedure for parameter estimation. Experimental results on four data sets show that GTRF achieves much lower perplexity than LDA and linear dependency topic models and produces better topic coherence.} }
Endnote
%0 Conference Paper %T On Modelling Non-linear Topical Dependencies %A Zhixing Li %A Siqiang Wen %A Juanzi Li %A Peng Zhang %A Jie Tang %B Proceedings of the 31st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2014 %E Eric P. Xing %E Tony Jebara %F pmlr-v32-lib14 %I PMLR %J Proceedings of Machine Learning Research %P 458--466 %U http://proceedings.mlr.press %V 32 %N 1 %W PMLR %X Probabilistic topic models such as Latent Dirichlet Allocation (LDA) discover latent topics from large corpora by exploiting words’ co-occurring relation. By observing the topical similarity between words, we find that some other relations, such as semantic or syntax relation between words, lead to strong dependence between their topics. In this paper, sentences are represented as dependency trees and a Global Topic Random Field (GTRF) is presented to model the non-linear dependencies between words. To infer our model, a new global factor is defined over all edges and the normalization factor of GRF is proven to be a constant. As a result, no independent assumption is needed when inferring our model. Based on it, we develop an efficient expectation-maximization (EM) procedure for parameter estimation. Experimental results on four data sets show that GTRF achieves much lower perplexity than LDA and linear dependency topic models and produces better topic coherence.
RIS
TY - CPAPER TI - On Modelling Non-linear Topical Dependencies AU - Zhixing Li AU - Siqiang Wen AU - Juanzi Li AU - Peng Zhang AU - Jie Tang BT - Proceedings of the 31st International Conference on Machine Learning PY - 2014/01/27 DA - 2014/01/27 ED - Eric P. Xing ED - Tony Jebara ID - pmlr-v32-lib14 PB - PMLR SP - 458 DP - PMLR EP - 466 L1 - http://proceedings.mlr.press/v32/lib14.pdf UR - http://proceedings.mlr.press/v32/lib14.html AB - Probabilistic topic models such as Latent Dirichlet Allocation (LDA) discover latent topics from large corpora by exploiting words’ co-occurring relation. By observing the topical similarity between words, we find that some other relations, such as semantic or syntax relation between words, lead to strong dependence between their topics. In this paper, sentences are represented as dependency trees and a Global Topic Random Field (GTRF) is presented to model the non-linear dependencies between words. To infer our model, a new global factor is defined over all edges and the normalization factor of GRF is proven to be a constant. As a result, no independent assumption is needed when inferring our model. Based on it, we develop an efficient expectation-maximization (EM) procedure for parameter estimation. Experimental results on four data sets show that GTRF achieves much lower perplexity than LDA and linear dependency topic models and produces better topic coherence. ER -
APA
Li, Z., Wen, S., Li, J., Zhang, P. & Tang, J.. (2014). On Modelling Non-linear Topical Dependencies. Proceedings of the 31st International Conference on Machine Learning, in PMLR 32(1):458-466

Related Material