Semi-Supervised Prediction-Constrained Topic Models

Michael Hughes; Gabriel Hope; Leah Weiner; Thomas McCoy; Roy Perlis; Erik Sudderth; Finale Doshi-Velez

Semi-Supervised Prediction-Constrained Topic Models

Michael Hughes, Gabriel Hope, Leah Weiner, Thomas McCoy, Roy Perlis, Erik Sudderth, Finale Doshi-Velez

Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:1067-1076, 2018.

Abstract

Supervisory signals can help topic models discover low-dimensional data representations which are useful for a specific prediction task. We propose a framework for training supervised latent Dirichlet allocation that balances two goals: faithful generative explanations of high-dimensional data and accurate prediction of associated class labels. Existing approaches fail to balance these goals by not properly handling a fundamental asymmetry: the intended application is always predicting labels from data, not data from labels. Our new prediction-constrained objective for training generative models coherently integrates supervisory signals even when only a small fraction of training examples are labeled. We demonstrate improved prediction quality compared to previous supervised topic models, achieving results competitive with high-dimensional logistic regression on text analysis and electronic health records tasks while simultaneously learning interpretable topics.

Cite this Paper

BibTeX

@InProceedings{pmlr-v84-hughes18a,
  title = 	 {Semi-Supervised Prediction-Constrained Topic Models},
  author = 	 {Hughes, Michael and Hope, Gabriel and Weiner, Leah and McCoy, Thomas and Perlis, Roy and Sudderth, Erik and Doshi-Velez, Finale},
  booktitle = 	 {Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics},
  pages = 	 {1067--1076},
  year = 	 {2018},
  editor = 	 {Storkey, Amos and Perez-Cruz, Fernando},
  volume = 	 {84},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--11 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v84/hughes18a/hughes18a.pdf},
  url = 	 {https://proceedings.mlr.press/v84/hughes18a.html},
  abstract = 	 {Supervisory signals can help topic models discover low-dimensional data representations which are useful for a specific prediction task. We propose a framework for training supervised latent Dirichlet allocation that balances two goals: faithful generative explanations of high-dimensional data and accurate prediction of associated class labels. Existing approaches fail to balance these goals by not properly handling a fundamental asymmetry: the intended application is always predicting labels from data, not data from labels. Our new prediction-constrained objective for training generative models coherently integrates supervisory signals even when only a small fraction of training examples are labeled. We demonstrate improved prediction quality compared to previous supervised topic models, achieving results competitive with high-dimensional logistic regression on text analysis and electronic health records tasks while simultaneously learning interpretable topics. }
}

Endnote

%0 Conference Paper
%T Semi-Supervised Prediction-Constrained Topic Models
%A Michael Hughes
%A Gabriel Hope
%A Leah Weiner
%A Thomas McCoy
%A Roy Perlis
%A Erik Sudderth
%A Finale Doshi-Velez
%B Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2018
%E Amos Storkey
%E Fernando Perez-Cruz	
%F pmlr-v84-hughes18a
%I PMLR
%P 1067--1076
%U https://proceedings.mlr.press/v84/hughes18a.html
%V 84
%X Supervisory signals can help topic models discover low-dimensional data representations which are useful for a specific prediction task. We propose a framework for training supervised latent Dirichlet allocation that balances two goals: faithful generative explanations of high-dimensional data and accurate prediction of associated class labels. Existing approaches fail to balance these goals by not properly handling a fundamental asymmetry: the intended application is always predicting labels from data, not data from labels. Our new prediction-constrained objective for training generative models coherently integrates supervisory signals even when only a small fraction of training examples are labeled. We demonstrate improved prediction quality compared to previous supervised topic models, achieving results competitive with high-dimensional logistic regression on text analysis and electronic health records tasks while simultaneously learning interpretable topics.

APA

Hughes, M., Hope, G., Weiner, L., McCoy, T., Perlis, R., Sudderth, E. & Doshi-Velez, F.. (2018). Semi-Supervised Prediction-Constrained Topic Models. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 84:1067-1076 Available from https://proceedings.mlr.press/v84/hughes18a.html.

Semi-Supervised Prediction-Constrained Topic Models

Abstract

Cite this Paper

Related Material