Efficient inference for dynamic topic modeling with large vocabularies

Federico Tomasi, Mounia Lalmas, Zhenwen Dai
Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, PMLR 180:1950-1959, 2022.

Abstract

Dynamic topic modeling is a well established tool for capturing the temporal dynamics of the topics of a corpus. In this work, we develop a scalable dynamic topic model by utilizing the correlation among the words in the vocabulary. By correlating previously independent temporal processes for words, our new model allows us to reliably estimate the topic representations containing less frequent words. We develop an amortised variational inference method with self-normalised importance sampling approximation to the word distribution that dramatically reduces the computational complexity and the number of variational parameters in order to handle large vocabularies. With extensive experiments on text datasets, we show that our method significantly outperforms the previous works by modeling word correlations, and it is able to handle real world data with a large vocabulary which could not be processed by previous continuous dynamic topic models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v180-tomasi22a, title = {Efficient inference for dynamic topic modeling with large vocabularies}, author = {Tomasi, Federico and Lalmas, Mounia and Dai, Zhenwen}, booktitle = {Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence}, pages = {1950--1959}, year = {2022}, editor = {Cussens, James and Zhang, Kun}, volume = {180}, series = {Proceedings of Machine Learning Research}, month = {01--05 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v180/tomasi22a/tomasi22a.pdf}, url = {https://proceedings.mlr.press/v180/tomasi22a.html}, abstract = {Dynamic topic modeling is a well established tool for capturing the temporal dynamics of the topics of a corpus. In this work, we develop a scalable dynamic topic model by utilizing the correlation among the words in the vocabulary. By correlating previously independent temporal processes for words, our new model allows us to reliably estimate the topic representations containing less frequent words. We develop an amortised variational inference method with self-normalised importance sampling approximation to the word distribution that dramatically reduces the computational complexity and the number of variational parameters in order to handle large vocabularies. With extensive experiments on text datasets, we show that our method significantly outperforms the previous works by modeling word correlations, and it is able to handle real world data with a large vocabulary which could not be processed by previous continuous dynamic topic models.} }
Endnote
%0 Conference Paper %T Efficient inference for dynamic topic modeling with large vocabularies %A Federico Tomasi %A Mounia Lalmas %A Zhenwen Dai %B Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2022 %E James Cussens %E Kun Zhang %F pmlr-v180-tomasi22a %I PMLR %P 1950--1959 %U https://proceedings.mlr.press/v180/tomasi22a.html %V 180 %X Dynamic topic modeling is a well established tool for capturing the temporal dynamics of the topics of a corpus. In this work, we develop a scalable dynamic topic model by utilizing the correlation among the words in the vocabulary. By correlating previously independent temporal processes for words, our new model allows us to reliably estimate the topic representations containing less frequent words. We develop an amortised variational inference method with self-normalised importance sampling approximation to the word distribution that dramatically reduces the computational complexity and the number of variational parameters in order to handle large vocabularies. With extensive experiments on text datasets, we show that our method significantly outperforms the previous works by modeling word correlations, and it is able to handle real world data with a large vocabulary which could not be processed by previous continuous dynamic topic models.
APA
Tomasi, F., Lalmas, M. & Dai, Z.. (2022). Efficient inference for dynamic topic modeling with large vocabularies. Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 180:1950-1959 Available from https://proceedings.mlr.press/v180/tomasi22a.html.

Related Material