Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network

Zhibin Duan, Dongsheng Wang, Bo Chen, Chaojie Wang, Wenchao Chen, Yewen Li, Jie Ren, Mingyuan Zhou
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:2903-2913, 2021.

Abstract

Hierarchical topic models such as the gamma belief network (GBN) have delivered promising results in mining multi-layer document representations and discovering interpretable topic taxonomies. However, they often assume in the prior that the topics at each layer are independently drawn from the Dirichlet distribution, ignoring the dependencies between the topics both at the same layer and across different layers. To relax this assumption, we propose sawtooth factorial topic embedding guided GBN, a deep generative model of documents that captures the dependencies and semantic similarities between the topics in the embedding space. Specifically, both the words and topics are represented as embedding vectors of the same dimension. The topic matrix at a layer is factorized into the product of a factor loading matrix and a topic embedding matrix, the transpose of which is set as the factor loading matrix of the layer above. Repeating this particular type of factorization, which shares components between adjacent layers, leads to a structure referred to as sawtooth factorization. An auto-encoding variational inference network is constructed to optimize the model parameter via stochastic gradient descent. Experiments on big corpora show that our models outperform other neural topic models on extracting deeper interpretable topics and deriving better document representations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-duan21b, title = {Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network}, author = {Duan, Zhibin and Wang, Dongsheng and Chen, Bo and Wang, Chaojie and Chen, Wenchao and Li, Yewen and Ren, Jie and Zhou, Mingyuan}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {2903--2913}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/duan21b/duan21b.pdf}, url = {https://proceedings.mlr.press/v139/duan21b.html}, abstract = {Hierarchical topic models such as the gamma belief network (GBN) have delivered promising results in mining multi-layer document representations and discovering interpretable topic taxonomies. However, they often assume in the prior that the topics at each layer are independently drawn from the Dirichlet distribution, ignoring the dependencies between the topics both at the same layer and across different layers. To relax this assumption, we propose sawtooth factorial topic embedding guided GBN, a deep generative model of documents that captures the dependencies and semantic similarities between the topics in the embedding space. Specifically, both the words and topics are represented as embedding vectors of the same dimension. The topic matrix at a layer is factorized into the product of a factor loading matrix and a topic embedding matrix, the transpose of which is set as the factor loading matrix of the layer above. Repeating this particular type of factorization, which shares components between adjacent layers, leads to a structure referred to as sawtooth factorization. An auto-encoding variational inference network is constructed to optimize the model parameter via stochastic gradient descent. Experiments on big corpora show that our models outperform other neural topic models on extracting deeper interpretable topics and deriving better document representations.} }
Endnote
%0 Conference Paper %T Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network %A Zhibin Duan %A Dongsheng Wang %A Bo Chen %A Chaojie Wang %A Wenchao Chen %A Yewen Li %A Jie Ren %A Mingyuan Zhou %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-duan21b %I PMLR %P 2903--2913 %U https://proceedings.mlr.press/v139/duan21b.html %V 139 %X Hierarchical topic models such as the gamma belief network (GBN) have delivered promising results in mining multi-layer document representations and discovering interpretable topic taxonomies. However, they often assume in the prior that the topics at each layer are independently drawn from the Dirichlet distribution, ignoring the dependencies between the topics both at the same layer and across different layers. To relax this assumption, we propose sawtooth factorial topic embedding guided GBN, a deep generative model of documents that captures the dependencies and semantic similarities between the topics in the embedding space. Specifically, both the words and topics are represented as embedding vectors of the same dimension. The topic matrix at a layer is factorized into the product of a factor loading matrix and a topic embedding matrix, the transpose of which is set as the factor loading matrix of the layer above. Repeating this particular type of factorization, which shares components between adjacent layers, leads to a structure referred to as sawtooth factorization. An auto-encoding variational inference network is constructed to optimize the model parameter via stochastic gradient descent. Experiments on big corpora show that our models outperform other neural topic models on extracting deeper interpretable topics and deriving better document representations.
APA
Duan, Z., Wang, D., Chen, B., Wang, C., Chen, W., Li, Y., Ren, J. & Zhou, M.. (2021). Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:2903-2913 Available from https://proceedings.mlr.press/v139/duan21b.html.

Related Material