Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment

Jason Chuang; Sonal Gupta; Christopher Manning; Jeffrey Heer

Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment

Jason Chuang, Sonal Gupta, Christopher Manning, Jeffrey Heer

Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):612-620, 2013.

Abstract

The use of topic models to analyze domain-specific texts often requires manual validation of the latent topics to ensure they are meaningful. We introduce a framework to support large-scale assessment of topical relevance. We measure the correspondence between a set of latent topics and a set of reference concepts to quantify four types of topical misalignment: junk, fused, missing, and repeated topics. Our analysis compares 10,000 topic model variants to 200 expert-provided domain concepts, and demonstrates how our framework can inform choices of model parameters, inference algorithms, and intrinsic measures of topical quality.

Cite this Paper

BibTeX


@InProceedings{pmlr-v28-chuang13,
  title = 	 {Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment},
  author = 	 {Chuang, Jason and Gupta, Sonal and Manning, Christopher and Heer, Jeffrey},
  booktitle = 	 {Proceedings of the 30th International Conference on Machine Learning},
  pages = 	 {612--620},
  year = 	 {2013},
  editor = 	 {Dasgupta, Sanjoy and McAllester, David},
  volume = 	 {28},
  number =       {3},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Atlanta, Georgia, USA},
  month = 	 {17--19 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v28/chuang13.pdf},
  url = 	 {https://proceedings.mlr.press/v28/chuang13.html},
  abstract = 	 {The use of topic models to analyze domain-specific texts often requires manual validation of the latent topics to ensure they are meaningful. We introduce a framework to support large-scale assessment of topical relevance. We measure the correspondence between a set of latent topics and a set of reference concepts to quantify four types of topical misalignment: junk, fused, missing, and repeated topics. Our analysis compares 10,000 topic model variants to 200 expert-provided domain concepts, and demonstrates how our framework can inform choices of model parameters, inference algorithms, and intrinsic measures of topical quality.}
}

Endnote

%0 Conference Paper
%T Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment
%A Jason Chuang
%A Sonal Gupta
%A Christopher Manning
%A Jeffrey Heer
%B Proceedings of the 30th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Sanjoy Dasgupta
%E David McAllester	
%F pmlr-v28-chuang13
%I PMLR
%P 612--620
%U https://proceedings.mlr.press/v28/chuang13.html
%V 28
%N 3
%X The use of topic models to analyze domain-specific texts often requires manual validation of the latent topics to ensure they are meaningful. We introduce a framework to support large-scale assessment of topical relevance. We measure the correspondence between a set of latent topics and a set of reference concepts to quantify four types of topical misalignment: junk, fused, missing, and repeated topics. Our analysis compares 10,000 topic model variants to 200 expert-provided domain concepts, and demonstrates how our framework can inform choices of model parameters, inference algorithms, and intrinsic measures of topical quality.

RIS


TY  - CPAPER
TI  - Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment
AU  - Jason Chuang
AU  - Sonal Gupta
AU  - Christopher Manning
AU  - Jeffrey Heer
BT  - Proceedings of the 30th International Conference on Machine Learning
DA  - 2013/05/26
ED  - Sanjoy Dasgupta
ED  - David McAllester	
ID  - pmlr-v28-chuang13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 28
IS  - 3
SP  - 612
EP  - 620
L1  - http://proceedings.mlr.press/v28/chuang13.pdf
UR  - https://proceedings.mlr.press/v28/chuang13.html
AB  - The use of topic models to analyze domain-specific texts often requires manual validation of the latent topics to ensure they are meaningful. We introduce a framework to support large-scale assessment of topical relevance. We measure the correspondence between a set of latent topics and a set of reference concepts to quantify four types of topical misalignment: junk, fused, missing, and repeated topics. Our analysis compares 10,000 topic model variants to 200 expert-provided domain concepts, and demonstrates how our framework can inform choices of model parameters, inference algorithms, and intrinsic measures of topical quality.
ER  -

APA


Chuang, J., Gupta, S., Manning, C. & Heer, J.. (2013). Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):612-620 Available from https://proceedings.mlr.press/v28/chuang13.html.

Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment

Abstract

Cite this Paper

Related Material