Cross-structural Factor-topic Model: Document Analysis with Sophisticated Covariates

Chien Lu, Jaakko Peltonen, Timo Nummenmaa, Jyrki Nummenmaa, Kalervo Järvelin
Proceedings of The 13th Asian Conference on Machine Learning, PMLR 157:1129-1144, 2021.

Abstract

Modern text data is increasingly gathered in situations where it is paired with a high-dimensional collection of covariates: then both the text, the covariates, and their relationships are of interest to analyze. Despite the growing amount of such data, current topic models are unable to take into account large amounts of covariates successfully: they fail to model structure among covariates and distort findings of both text and covariates. This paper presents a solution: a novel factor-topic model that enables researchers to analyze latent structure in both text and sophisticated document-level covariates collectively. The key innovation is that besides learning the underlying topical structure, the model also learns the underlying factorial structure from the covariates and the interactions between the two structures. A set of tailored variational inference algorithms for efficient computation are provided. Experiments on three different datasets show the model outperforms comparable topic models in the ability to predict held-out document content. Two case studies focusing on Finnish parliamentary election candidates and game players on Steam demonstrate the model discovers semantically meaningful topics, factors, and their interactions. The model both outperforms state-of-the-art models in predictive accuracy and offers new factor-topic insights beyond other topic models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v157-lu21a, title = {Cross-structural Factor-topic Model: Document Analysis with Sophisticated Covariates}, author = {Lu, Chien and Peltonen, Jaakko and Nummenmaa, Timo and Nummenmaa, Jyrki and J\"arvelin, Kalervo}, booktitle = {Proceedings of The 13th Asian Conference on Machine Learning}, pages = {1129--1144}, year = {2021}, editor = {Balasubramanian, Vineeth N. and Tsang, Ivor}, volume = {157}, series = {Proceedings of Machine Learning Research}, month = {17--19 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v157/lu21a/lu21a.pdf}, url = {https://proceedings.mlr.press/v157/lu21a.html}, abstract = {Modern text data is increasingly gathered in situations where it is paired with a high-dimensional collection of covariates: then both the text, the covariates, and their relationships are of interest to analyze. Despite the growing amount of such data, current topic models are unable to take into account large amounts of covariates successfully: they fail to model structure among covariates and distort findings of both text and covariates. This paper presents a solution: a novel factor-topic model that enables researchers to analyze latent structure in both text and sophisticated document-level covariates collectively. The key innovation is that besides learning the underlying topical structure, the model also learns the underlying factorial structure from the covariates and the interactions between the two structures. A set of tailored variational inference algorithms for efficient computation are provided. Experiments on three different datasets show the model outperforms comparable topic models in the ability to predict held-out document content. Two case studies focusing on Finnish parliamentary election candidates and game players on Steam demonstrate the model discovers semantically meaningful topics, factors, and their interactions. The model both outperforms state-of-the-art models in predictive accuracy and offers new factor-topic insights beyond other topic models.} }
Endnote
%0 Conference Paper %T Cross-structural Factor-topic Model: Document Analysis with Sophisticated Covariates %A Chien Lu %A Jaakko Peltonen %A Timo Nummenmaa %A Jyrki Nummenmaa %A Kalervo Järvelin %B Proceedings of The 13th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Vineeth N. Balasubramanian %E Ivor Tsang %F pmlr-v157-lu21a %I PMLR %P 1129--1144 %U https://proceedings.mlr.press/v157/lu21a.html %V 157 %X Modern text data is increasingly gathered in situations where it is paired with a high-dimensional collection of covariates: then both the text, the covariates, and their relationships are of interest to analyze. Despite the growing amount of such data, current topic models are unable to take into account large amounts of covariates successfully: they fail to model structure among covariates and distort findings of both text and covariates. This paper presents a solution: a novel factor-topic model that enables researchers to analyze latent structure in both text and sophisticated document-level covariates collectively. The key innovation is that besides learning the underlying topical structure, the model also learns the underlying factorial structure from the covariates and the interactions between the two structures. A set of tailored variational inference algorithms for efficient computation are provided. Experiments on three different datasets show the model outperforms comparable topic models in the ability to predict held-out document content. Two case studies focusing on Finnish parliamentary election candidates and game players on Steam demonstrate the model discovers semantically meaningful topics, factors, and their interactions. The model both outperforms state-of-the-art models in predictive accuracy and offers new factor-topic insights beyond other topic models.
APA
Lu, C., Peltonen, J., Nummenmaa, T., Nummenmaa, J. & Järvelin, K.. (2021). Cross-structural Factor-topic Model: Document Analysis with Sophisticated Covariates. Proceedings of The 13th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 157:1129-1144 Available from https://proceedings.mlr.press/v157/lu21a.html.

Related Material