On Modelling Non-linear Topical Dependencies

Zhixing Li, Siqiang Wen, Juanzi Li, Peng Zhang, Jie Tang
Proceedings of the 31st International Conference on Machine Learning, PMLR 32(1):458-466, 2014.

Abstract

Probabilistic topic models such as Latent Dirichlet Allocation (LDA) discover latent topics from large corpora by exploiting words’ co-occurring relation. By observing the topical similarity between words, we find that some other relations, such as semantic or syntax relation between words, lead to strong dependence between their topics. In this paper, sentences are represented as dependency trees and a Global Topic Random Field (GTRF) is presented to model the non-linear dependencies between words. To infer our model, a new global factor is defined over all edges and the normalization factor of GRF is proven to be a constant. As a result, no independent assumption is needed when inferring our model. Based on it, we develop an efficient expectation-maximization (EM) procedure for parameter estimation. Experimental results on four data sets show that GTRF achieves much lower perplexity than LDA and linear dependency topic models and produces better topic coherence.

Cite this Paper


BibTeX
@InProceedings{pmlr-v32-lib14, title = {On Modelling Non-linear Topical Dependencies}, author = {Li, Zhixing and Wen, Siqiang and Li, Juanzi and Zhang, Peng and Tang, Jie}, booktitle = {Proceedings of the 31st International Conference on Machine Learning}, pages = {458--466}, year = {2014}, editor = {Xing, Eric P. and Jebara, Tony}, volume = {32}, number = {1}, series = {Proceedings of Machine Learning Research}, address = {Bejing, China}, month = {22--24 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v32/lib14.pdf}, url = {https://proceedings.mlr.press/v32/lib14.html}, abstract = {Probabilistic topic models such as Latent Dirichlet Allocation (LDA) discover latent topics from large corpora by exploiting words’ co-occurring relation. By observing the topical similarity between words, we find that some other relations, such as semantic or syntax relation between words, lead to strong dependence between their topics. In this paper, sentences are represented as dependency trees and a Global Topic Random Field (GTRF) is presented to model the non-linear dependencies between words. To infer our model, a new global factor is defined over all edges and the normalization factor of GRF is proven to be a constant. As a result, no independent assumption is needed when inferring our model. Based on it, we develop an efficient expectation-maximization (EM) procedure for parameter estimation. Experimental results on four data sets show that GTRF achieves much lower perplexity than LDA and linear dependency topic models and produces better topic coherence.} }
Endnote
%0 Conference Paper %T On Modelling Non-linear Topical Dependencies %A Zhixing Li %A Siqiang Wen %A Juanzi Li %A Peng Zhang %A Jie Tang %B Proceedings of the 31st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2014 %E Eric P. Xing %E Tony Jebara %F pmlr-v32-lib14 %I PMLR %P 458--466 %U https://proceedings.mlr.press/v32/lib14.html %V 32 %N 1 %X Probabilistic topic models such as Latent Dirichlet Allocation (LDA) discover latent topics from large corpora by exploiting words’ co-occurring relation. By observing the topical similarity between words, we find that some other relations, such as semantic or syntax relation between words, lead to strong dependence between their topics. In this paper, sentences are represented as dependency trees and a Global Topic Random Field (GTRF) is presented to model the non-linear dependencies between words. To infer our model, a new global factor is defined over all edges and the normalization factor of GRF is proven to be a constant. As a result, no independent assumption is needed when inferring our model. Based on it, we develop an efficient expectation-maximization (EM) procedure for parameter estimation. Experimental results on four data sets show that GTRF achieves much lower perplexity than LDA and linear dependency topic models and produces better topic coherence.
RIS
TY - CPAPER TI - On Modelling Non-linear Topical Dependencies AU - Zhixing Li AU - Siqiang Wen AU - Juanzi Li AU - Peng Zhang AU - Jie Tang BT - Proceedings of the 31st International Conference on Machine Learning DA - 2014/01/27 ED - Eric P. Xing ED - Tony Jebara ID - pmlr-v32-lib14 PB - PMLR DP - Proceedings of Machine Learning Research VL - 32 IS - 1 SP - 458 EP - 466 L1 - http://proceedings.mlr.press/v32/lib14.pdf UR - https://proceedings.mlr.press/v32/lib14.html AB - Probabilistic topic models such as Latent Dirichlet Allocation (LDA) discover latent topics from large corpora by exploiting words’ co-occurring relation. By observing the topical similarity between words, we find that some other relations, such as semantic or syntax relation between words, lead to strong dependence between their topics. In this paper, sentences are represented as dependency trees and a Global Topic Random Field (GTRF) is presented to model the non-linear dependencies between words. To infer our model, a new global factor is defined over all edges and the normalization factor of GRF is proven to be a constant. As a result, no independent assumption is needed when inferring our model. Based on it, we develop an efficient expectation-maximization (EM) procedure for parameter estimation. Experimental results on four data sets show that GTRF achieves much lower perplexity than LDA and linear dependency topic models and produces better topic coherence. ER -
APA
Li, Z., Wen, S., Li, J., Zhang, P. & Tang, J.. (2014). On Modelling Non-linear Topical Dependencies. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(1):458-466 Available from https://proceedings.mlr.press/v32/lib14.html.

Related Material