Semi-Supervised Prediction-Constrained Topic Models

Michael Hughes, Gabriel Hope, Leah Weiner, Thomas McCoy, Roy Perlis, Erik Sudderth, Finale Doshi-Velez
Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:1067-1076, 2018.

Abstract

Supervisory signals can help topic models discover low-dimensional data representations which are useful for a specific prediction task. We propose a framework for training supervised latent Dirichlet allocation that balances two goals: faithful generative explanations of high-dimensional data and accurate prediction of associated class labels. Existing approaches fail to balance these goals by not properly handling a fundamental asymmetry: the intended application is always predicting labels from data, not data from labels. Our new prediction-constrained objective for training generative models coherently integrates supervisory signals even when only a small fraction of training examples are labeled. We demonstrate improved prediction quality compared to previous supervised topic models, achieving results competitive with high-dimensional logistic regression on text analysis and electronic health records tasks while simultaneously learning interpretable topics.

Cite this Paper


BibTeX
@InProceedings{pmlr-v84-hughes18a, title = {Semi-Supervised Prediction-Constrained Topic Models}, author = {Hughes, Michael and Hope, Gabriel and Weiner, Leah and McCoy, Thomas and Perlis, Roy and Sudderth, Erik and Doshi-Velez, Finale}, booktitle = {Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics}, pages = {1067--1076}, year = {2018}, editor = {Storkey, Amos and Perez-Cruz, Fernando}, volume = {84}, series = {Proceedings of Machine Learning Research}, month = {09--11 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v84/hughes18a/hughes18a.pdf}, url = {https://proceedings.mlr.press/v84/hughes18a.html}, abstract = {Supervisory signals can help topic models discover low-dimensional data representations which are useful for a specific prediction task. We propose a framework for training supervised latent Dirichlet allocation that balances two goals: faithful generative explanations of high-dimensional data and accurate prediction of associated class labels. Existing approaches fail to balance these goals by not properly handling a fundamental asymmetry: the intended application is always predicting labels from data, not data from labels. Our new prediction-constrained objective for training generative models coherently integrates supervisory signals even when only a small fraction of training examples are labeled. We demonstrate improved prediction quality compared to previous supervised topic models, achieving results competitive with high-dimensional logistic regression on text analysis and electronic health records tasks while simultaneously learning interpretable topics. } }
Endnote
%0 Conference Paper %T Semi-Supervised Prediction-Constrained Topic Models %A Michael Hughes %A Gabriel Hope %A Leah Weiner %A Thomas McCoy %A Roy Perlis %A Erik Sudderth %A Finale Doshi-Velez %B Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2018 %E Amos Storkey %E Fernando Perez-Cruz %F pmlr-v84-hughes18a %I PMLR %P 1067--1076 %U https://proceedings.mlr.press/v84/hughes18a.html %V 84 %X Supervisory signals can help topic models discover low-dimensional data representations which are useful for a specific prediction task. We propose a framework for training supervised latent Dirichlet allocation that balances two goals: faithful generative explanations of high-dimensional data and accurate prediction of associated class labels. Existing approaches fail to balance these goals by not properly handling a fundamental asymmetry: the intended application is always predicting labels from data, not data from labels. Our new prediction-constrained objective for training generative models coherently integrates supervisory signals even when only a small fraction of training examples are labeled. We demonstrate improved prediction quality compared to previous supervised topic models, achieving results competitive with high-dimensional logistic regression on text analysis and electronic health records tasks while simultaneously learning interpretable topics.
APA
Hughes, M., Hope, G., Weiner, L., McCoy, T., Perlis, R., Sudderth, E. & Doshi-Velez, F.. (2018). Semi-Supervised Prediction-Constrained Topic Models. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 84:1067-1076 Available from https://proceedings.mlr.press/v84/hughes18a.html.

Related Material