Improving Distributed Word Representation and Topic Model by Word-Topic Mixture Model

Xianghua Fu, Ting Wang, Jing Li, Chong Yu, Wangwang Liu
; Proceedings of The 8th Asian Conference on Machine Learning, PMLR 63:190-205, 2016.

Abstract

We propose a Word-Topic Mixture(WTM) model to improve word representation and topic model simultaneously. Firstly, it introduces the initial external word embeddings into the Topical Word Embeddings(TWE) model based on Latent Dirichlet Allocation(LDA) model to learn word embeddings and topic vectors. Then the results learned from TWE are integrated in the LDA by defining the probability distribution of topic vectors-word embeddings according to the idea of latent feature model with LDA (LFLDA), meanwhile minimizing the KL divergence of the new topic-word distribution function and the original one. The experimental results prove that the WTM model performs better on word representation and topic detection compared with some state-of-the-art models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v63-Fu60, title = {Improving Distributed Word Representation and Topic Model by Word-Topic Mixture Model}, author = {Xianghua Fu and Ting Wang and Jing Li and Chong Yu and Wangwang Liu}, pages = {190--205}, year = {2016}, editor = {Robert J. Durrant and Kee-Eung Kim}, volume = {63}, series = {Proceedings of Machine Learning Research}, address = {The University of Waikato, Hamilton, New Zealand}, month = {16--18 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v63/Fu60.pdf}, url = {http://proceedings.mlr.press/v63/Fu60.html}, abstract = {We propose a Word-Topic Mixture(WTM) model to improve word representation and topic model simultaneously. Firstly, it introduces the initial external word embeddings into the Topical Word Embeddings(TWE) model based on Latent Dirichlet Allocation(LDA) model to learn word embeddings and topic vectors. Then the results learned from TWE are integrated in the LDA by defining the probability distribution of topic vectors-word embeddings according to the idea of latent feature model with LDA (LFLDA), meanwhile minimizing the KL divergence of the new topic-word distribution function and the original one. The experimental results prove that the WTM model performs better on word representation and topic detection compared with some state-of-the-art models.} }
Endnote
%0 Conference Paper %T Improving Distributed Word Representation and Topic Model by Word-Topic Mixture Model %A Xianghua Fu %A Ting Wang %A Jing Li %A Chong Yu %A Wangwang Liu %B Proceedings of The 8th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2016 %E Robert J. Durrant %E Kee-Eung Kim %F pmlr-v63-Fu60 %I PMLR %J Proceedings of Machine Learning Research %P 190--205 %U http://proceedings.mlr.press %V 63 %W PMLR %X We propose a Word-Topic Mixture(WTM) model to improve word representation and topic model simultaneously. Firstly, it introduces the initial external word embeddings into the Topical Word Embeddings(TWE) model based on Latent Dirichlet Allocation(LDA) model to learn word embeddings and topic vectors. Then the results learned from TWE are integrated in the LDA by defining the probability distribution of topic vectors-word embeddings according to the idea of latent feature model with LDA (LFLDA), meanwhile minimizing the KL divergence of the new topic-word distribution function and the original one. The experimental results prove that the WTM model performs better on word representation and topic detection compared with some state-of-the-art models.
RIS
TY - CPAPER TI - Improving Distributed Word Representation and Topic Model by Word-Topic Mixture Model AU - Xianghua Fu AU - Ting Wang AU - Jing Li AU - Chong Yu AU - Wangwang Liu BT - Proceedings of The 8th Asian Conference on Machine Learning PY - 2016/11/20 DA - 2016/11/20 ED - Robert J. Durrant ED - Kee-Eung Kim ID - pmlr-v63-Fu60 PB - PMLR SP - 190 DP - PMLR EP - 205 L1 - http://proceedings.mlr.press/v63/Fu60.pdf UR - http://proceedings.mlr.press/v63/Fu60.html AB - We propose a Word-Topic Mixture(WTM) model to improve word representation and topic model simultaneously. Firstly, it introduces the initial external word embeddings into the Topical Word Embeddings(TWE) model based on Latent Dirichlet Allocation(LDA) model to learn word embeddings and topic vectors. Then the results learned from TWE are integrated in the LDA by defining the probability distribution of topic vectors-word embeddings according to the idea of latent feature model with LDA (LFLDA), meanwhile minimizing the KL divergence of the new topic-word distribution function and the original one. The experimental results prove that the WTM model performs better on word representation and topic detection compared with some state-of-the-art models. ER -
APA
Fu, X., Wang, T., Li, J., Yu, C. & Liu, W.. (2016). Improving Distributed Word Representation and Topic Model by Word-Topic Mixture Model. Proceedings of The 8th Asian Conference on Machine Learning, in PMLR 63:190-205

Related Material