Topic Modeling using Topics from Many Domains, Lifelong Learning and Big Data

Zhiyuan Chen, Bing Liu
; Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):703-711, 2014.

Abstract

Topic modeling has been commonly used to discover topics from document collections. However, unsupervised models can generate many incoherent topics. To address this problem, several knowledge-based topic models have been proposed to incorporate prior domain knowledge from the user. This work advances this research much further and shows that without any user input, we can mine the prior knowledge automatically and dynamically from topics already found from a large number of domains. This paper first proposes a novel method to mine such prior knowledge dynamically in the modeling process, and then a new topic model to use the knowledge to guide the model inference. What is also interesting is that this approach offers a novel lifelong learning algorithm for topic discovery, which exploits the big (past) data and knowledge gained from such data for subsequent modeling. Our experimental results using product reviews from 50 domains demonstrate the effectiveness of the proposed approach.

Cite this Paper


BibTeX
@InProceedings{pmlr-v32-chenf14, title = {Topic Modeling using Topics from Many Domains, Lifelong Learning and Big Data}, author = {Zhiyuan Chen and Bing Liu}, booktitle = {Proceedings of the 31st International Conference on Machine Learning}, pages = {703--711}, year = {2014}, editor = {Eric P. Xing and Tony Jebara}, volume = {32}, number = {2}, series = {Proceedings of Machine Learning Research}, address = {Bejing, China}, month = {22--24 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v32/chenf14.pdf}, url = {http://proceedings.mlr.press/v32/chenf14.html}, abstract = {Topic modeling has been commonly used to discover topics from document collections. However, unsupervised models can generate many incoherent topics. To address this problem, several knowledge-based topic models have been proposed to incorporate prior domain knowledge from the user. This work advances this research much further and shows that without any user input, we can mine the prior knowledge automatically and dynamically from topics already found from a large number of domains. This paper first proposes a novel method to mine such prior knowledge dynamically in the modeling process, and then a new topic model to use the knowledge to guide the model inference. What is also interesting is that this approach offers a novel lifelong learning algorithm for topic discovery, which exploits the big (past) data and knowledge gained from such data for subsequent modeling. Our experimental results using product reviews from 50 domains demonstrate the effectiveness of the proposed approach.} }
Endnote
%0 Conference Paper %T Topic Modeling using Topics from Many Domains, Lifelong Learning and Big Data %A Zhiyuan Chen %A Bing Liu %B Proceedings of the 31st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2014 %E Eric P. Xing %E Tony Jebara %F pmlr-v32-chenf14 %I PMLR %J Proceedings of Machine Learning Research %P 703--711 %U http://proceedings.mlr.press %V 32 %N 2 %W PMLR %X Topic modeling has been commonly used to discover topics from document collections. However, unsupervised models can generate many incoherent topics. To address this problem, several knowledge-based topic models have been proposed to incorporate prior domain knowledge from the user. This work advances this research much further and shows that without any user input, we can mine the prior knowledge automatically and dynamically from topics already found from a large number of domains. This paper first proposes a novel method to mine such prior knowledge dynamically in the modeling process, and then a new topic model to use the knowledge to guide the model inference. What is also interesting is that this approach offers a novel lifelong learning algorithm for topic discovery, which exploits the big (past) data and knowledge gained from such data for subsequent modeling. Our experimental results using product reviews from 50 domains demonstrate the effectiveness of the proposed approach.
RIS
TY - CPAPER TI - Topic Modeling using Topics from Many Domains, Lifelong Learning and Big Data AU - Zhiyuan Chen AU - Bing Liu BT - Proceedings of the 31st International Conference on Machine Learning PY - 2014/01/27 DA - 2014/01/27 ED - Eric P. Xing ED - Tony Jebara ID - pmlr-v32-chenf14 PB - PMLR SP - 703 DP - PMLR EP - 711 L1 - http://proceedings.mlr.press/v32/chenf14.pdf UR - http://proceedings.mlr.press/v32/chenf14.html AB - Topic modeling has been commonly used to discover topics from document collections. However, unsupervised models can generate many incoherent topics. To address this problem, several knowledge-based topic models have been proposed to incorporate prior domain knowledge from the user. This work advances this research much further and shows that without any user input, we can mine the prior knowledge automatically and dynamically from topics already found from a large number of domains. This paper first proposes a novel method to mine such prior knowledge dynamically in the modeling process, and then a new topic model to use the knowledge to guide the model inference. What is also interesting is that this approach offers a novel lifelong learning algorithm for topic discovery, which exploits the big (past) data and knowledge gained from such data for subsequent modeling. Our experimental results using product reviews from 50 domains demonstrate the effectiveness of the proposed approach. ER -
APA
Chen, Z. & Liu, B.. (2014). Topic Modeling using Topics from Many Domains, Lifelong Learning and Big Data. Proceedings of the 31st International Conference on Machine Learning, in PMLR 32(2):703-711

Related Material