Improving Distributed Word Representation and Topic Model by Word-Topic Mixture Model

Xianghua Fu; Ting Wang; Jing Li; Chong Yu; Wangwang Liu

Improving Distributed Word Representation and Topic Model by Word-Topic Mixture Model

Xianghua Fu, Ting Wang, Jing Li, Chong Yu, Wangwang Liu

Proceedings of The 8th Asian Conference on Machine Learning, PMLR 63:190-205, 2016.

Abstract

We propose a Word-Topic Mixture(WTM) model to improve word representation and topic model simultaneously. Firstly, it introduces the initial external word embeddings into the Topical Word Embeddings(TWE) model based on Latent Dirichlet Allocation(LDA) model to learn word embeddings and topic vectors. Then the results learned from TWE are integrated in the LDA by defining the probability distribution of topic vectors-word embeddings according to the idea of latent feature model with LDA (LFLDA), meanwhile minimizing the KL divergence of the new topic-word distribution function and the original one. The experimental results prove that the WTM model performs better on word representation and topic detection compared with some state-of-the-art models.

Cite this Paper

BibTeX


@InProceedings{pmlr-v63-Fu60,
  title = 	 {Improving Distributed Word Representation and Topic Model by Word-Topic Mixture Model},
  author = 	 {Fu, Xianghua and Wang, Ting and Li, Jing and Yu, Chong and Liu, Wangwang},
  booktitle = 	 {Proceedings of The 8th Asian Conference on Machine Learning},
  pages = 	 {190--205},
  year = 	 {2016},
  editor = 	 {Durrant, Robert J. and Kim, Kee-Eung},
  volume = 	 {63},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {The University of Waikato, Hamilton, New Zealand},
  month = 	 {16--18 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v63/Fu60.pdf},
  url = 	 {https://proceedings.mlr.press/v63/Fu60.html},
  abstract = 	 {We propose a Word-Topic Mixture(WTM) model to improve word representation and topic model simultaneously. Firstly, it introduces the initial external word embeddings into the Topical Word Embeddings(TWE) model based on Latent Dirichlet Allocation(LDA) model to learn word embeddings and topic vectors. Then the results learned from TWE are integrated in the LDA by defining the probability distribution of topic vectors-word embeddings according to the idea of latent feature model with LDA (LFLDA), meanwhile minimizing the KL divergence of the new topic-word distribution function and the original one. The experimental results prove that the WTM model performs better on word representation and topic detection compared with some state-of-the-art models.}
}

Endnote

%0 Conference Paper
%T Improving Distributed Word Representation and Topic Model by Word-Topic Mixture Model
%A Xianghua Fu
%A Ting Wang
%A Jing Li
%A Chong Yu
%A Wangwang Liu
%B Proceedings of The 8th Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Robert J. Durrant
%E Kee-Eung Kim	
%F pmlr-v63-Fu60
%I PMLR
%P 190--205
%U https://proceedings.mlr.press/v63/Fu60.html
%V 63
%X We propose a Word-Topic Mixture(WTM) model to improve word representation and topic model simultaneously. Firstly, it introduces the initial external word embeddings into the Topical Word Embeddings(TWE) model based on Latent Dirichlet Allocation(LDA) model to learn word embeddings and topic vectors. Then the results learned from TWE are integrated in the LDA by defining the probability distribution of topic vectors-word embeddings according to the idea of latent feature model with LDA (LFLDA), meanwhile minimizing the KL divergence of the new topic-word distribution function and the original one. The experimental results prove that the WTM model performs better on word representation and topic detection compared with some state-of-the-art models.

RIS


TY  - CPAPER
TI  - Improving Distributed Word Representation and Topic Model by Word-Topic Mixture Model
AU  - Xianghua Fu
AU  - Ting Wang
AU  - Jing Li
AU  - Chong Yu
AU  - Wangwang Liu
BT  - Proceedings of The 8th Asian Conference on Machine Learning
DA  - 2016/11/20
ED  - Robert J. Durrant
ED  - Kee-Eung Kim	
ID  - pmlr-v63-Fu60
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 63
SP  - 190
EP  - 205
L1  - http://proceedings.mlr.press/v63/Fu60.pdf
UR  - https://proceedings.mlr.press/v63/Fu60.html
AB  - We propose a Word-Topic Mixture(WTM) model to improve word representation and topic model simultaneously. Firstly, it introduces the initial external word embeddings into the Topical Word Embeddings(TWE) model based on Latent Dirichlet Allocation(LDA) model to learn word embeddings and topic vectors. Then the results learned from TWE are integrated in the LDA by defining the probability distribution of topic vectors-word embeddings according to the idea of latent feature model with LDA (LFLDA), meanwhile minimizing the KL divergence of the new topic-word distribution function and the original one. The experimental results prove that the WTM model performs better on word representation and topic detection compared with some state-of-the-art models.
ER  -

APA


Fu, X., Wang, T., Li, J., Yu, C. & Liu, W.. (2016). Improving Distributed Word Representation and Topic Model by Word-Topic Mixture Model. Proceedings of The 8th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 63:190-205 Available from https://proceedings.mlr.press/v63/Fu60.html.

Related Material

Download PDF