A Word Embeddings Informed Focused Topic Model

He Zhao, Lan Du, Wray Buntine
Proceedings of the Ninth Asian Conference on Machine Learning, PMLR 77:423-438, 2017.

Abstract

In natural language processing and related fields, it has been shown that the word embeddings can successfully capture both the semantic and syntactic features of words. They can serve as complementary information to topics models, especially for the cases where word co-occurrence data is insufficient, such as with short texts. In this paper, we propose a focused topic model where how a topic focuses on words is informed by word embeddings. Our models is able to discover more informed and focused topics with more representative words, leading to better modelling accuracy and topic quality. With the data argumentation technique, we can derive an efficient Gibbs sampling algorithm that benefits from the fully local conjugacy of the model. We conduct extensive experiments on several real world datasets, which demonstrate that our model achieves comparable or improved performance in terms of both perplexity and topic coherence, particularly in handling short text data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v77-zhao17a, title = {A Word Embeddings Informed Focused Topic Model}, author = {Zhao, He and Du, Lan and Buntine, Wray}, booktitle = {Proceedings of the Ninth Asian Conference on Machine Learning}, pages = {423--438}, year = {2017}, editor = {Zhang, Min-Ling and Noh, Yung-Kyun}, volume = {77}, series = {Proceedings of Machine Learning Research}, address = {Yonsei University, Seoul, Republic of Korea}, month = {15--17 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v77/zhao17a/zhao17a.pdf}, url = {https://proceedings.mlr.press/v77/zhao17a.html}, abstract = {In natural language processing and related fields, it has been shown that the word embeddings can successfully capture both the semantic and syntactic features of words. They can serve as complementary information to topics models, especially for the cases where word co-occurrence data is insufficient, such as with short texts. In this paper, we propose a focused topic model where how a topic focuses on words is informed by word embeddings. Our models is able to discover more informed and focused topics with more representative words, leading to better modelling accuracy and topic quality. With the data argumentation technique, we can derive an efficient Gibbs sampling algorithm that benefits from the fully local conjugacy of the model. We conduct extensive experiments on several real world datasets, which demonstrate that our model achieves comparable or improved performance in terms of both perplexity and topic coherence, particularly in handling short text data.} }
Endnote
%0 Conference Paper %T A Word Embeddings Informed Focused Topic Model %A He Zhao %A Lan Du %A Wray Buntine %B Proceedings of the Ninth Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Min-Ling Zhang %E Yung-Kyun Noh %F pmlr-v77-zhao17a %I PMLR %P 423--438 %U https://proceedings.mlr.press/v77/zhao17a.html %V 77 %X In natural language processing and related fields, it has been shown that the word embeddings can successfully capture both the semantic and syntactic features of words. They can serve as complementary information to topics models, especially for the cases where word co-occurrence data is insufficient, such as with short texts. In this paper, we propose a focused topic model where how a topic focuses on words is informed by word embeddings. Our models is able to discover more informed and focused topics with more representative words, leading to better modelling accuracy and topic quality. With the data argumentation technique, we can derive an efficient Gibbs sampling algorithm that benefits from the fully local conjugacy of the model. We conduct extensive experiments on several real world datasets, which demonstrate that our model achieves comparable or improved performance in terms of both perplexity and topic coherence, particularly in handling short text data.
APA
Zhao, H., Du, L. & Buntine, W.. (2017). A Word Embeddings Informed Focused Topic Model. Proceedings of the Ninth Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 77:423-438 Available from https://proceedings.mlr.press/v77/zhao17a.html.

Related Material