Prior-aware Composition Inference for Spectral Topic Models

Moontae Lee, David Bindel, David Mimno
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:4258-4268, 2020.

Abstract

Spectral algorithms operate on matrices or tensors of word co-occurrence to learn latent topics. These approaches remove the dependence on the original documents and produce substantial gains in efficiency with provable inference, but at a cost: the models can no longer infer any information about individual documents. Thresholded Linear Inverse is developed to learn document-specific topic compositions, but its linear characteristics limit the inference quality without considering any prior information on topic distributions. We propose two novel estimation methods that respect previously unclear prior structures of spectral topic models. Experiments on a variety of synthetic to real collections demonstrate that our Prior-Aware Dual Decomposition outperforms the baseline method, whereas our Prior-Aware Manifold Iteration performs even better on short realistic data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-lee20c, title = {Prior-aware Composition Inference for Spectral Topic Models}, author = {Lee, Moontae and Bindel, David and Mimno, David}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {4258--4268}, year = {2020}, editor = {Silvia Chiappa and Roberto Calandra}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/lee20c/lee20c.pdf}, url = { http://proceedings.mlr.press/v108/lee20c.html }, abstract = {Spectral algorithms operate on matrices or tensors of word co-occurrence to learn latent topics. These approaches remove the dependence on the original documents and produce substantial gains in efficiency with provable inference, but at a cost: the models can no longer infer any information about individual documents. Thresholded Linear Inverse is developed to learn document-specific topic compositions, but its linear characteristics limit the inference quality without considering any prior information on topic distributions. We propose two novel estimation methods that respect previously unclear prior structures of spectral topic models. Experiments on a variety of synthetic to real collections demonstrate that our Prior-Aware Dual Decomposition outperforms the baseline method, whereas our Prior-Aware Manifold Iteration performs even better on short realistic data.} }
Endnote
%0 Conference Paper %T Prior-aware Composition Inference for Spectral Topic Models %A Moontae Lee %A David Bindel %A David Mimno %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-lee20c %I PMLR %P 4258--4268 %U http://proceedings.mlr.press/v108/lee20c.html %V 108 %X Spectral algorithms operate on matrices or tensors of word co-occurrence to learn latent topics. These approaches remove the dependence on the original documents and produce substantial gains in efficiency with provable inference, but at a cost: the models can no longer infer any information about individual documents. Thresholded Linear Inverse is developed to learn document-specific topic compositions, but its linear characteristics limit the inference quality without considering any prior information on topic distributions. We propose two novel estimation methods that respect previously unclear prior structures of spectral topic models. Experiments on a variety of synthetic to real collections demonstrate that our Prior-Aware Dual Decomposition outperforms the baseline method, whereas our Prior-Aware Manifold Iteration performs even better on short realistic data.
APA
Lee, M., Bindel, D. & Mimno, D.. (2020). Prior-aware Composition Inference for Spectral Topic Models. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:4258-4268 Available from http://proceedings.mlr.press/v108/lee20c.html .

Related Material