An Alternative Prior Process for Nonparametric Bayesian Clustering

Hanna Wallach; Shane Jensen; Lee Dicker; Katherine Heller

An Alternative Prior Process for Nonparametric Bayesian Clustering

Hanna Wallach, Shane Jensen, Lee Dicker, Katherine Heller

Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, PMLR 9:892-899, 2010.

Abstract

Prior distributions play a crucial role in Bayesian approaches to clustering. Two commonly-used prior distributions are the Dirichlet and Pitman-Yor processes. In this paper, we investigate the predictive probabilities that underlie these processes, and the implicit “rich-get-richer” characteristic of the resulting partitions. We explore an alternative prior for nonparametric Bayesian clustering, the uniform process, for applications where the “rich-get-richer” property is undesirable. We also explore the cost of this new process: partitions are no longer exchangeable with respect to the ordering of variables. We present new asymptotic and simulation-based results for the clustering characteristics of the uniform process and compare these with known results for the Dirichlet and Pitman-Yor processes. Finally, we compare performance on a real document clustering task, demonstrating the practical advantage of the uniform process despite its lack of exchangeability over orderings.

Cite this Paper

BibTeX


@InProceedings{pmlr-v9-wallach10a,
  title = 	 {An Alternative Prior Process for Nonparametric Bayesian Clustering},
  author = 	 {Wallach, Hanna and Jensen, Shane and Dicker, Lee and Heller, Katherine},
  booktitle = 	 {Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {892--899},
  year = 	 {2010},
  editor = 	 {Teh, Yee Whye and Titterington, Mike},
  volume = 	 {9},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Chia Laguna Resort, Sardinia, Italy},
  month = 	 {13--15 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v9/wallach10a/wallach10a.pdf},
  url = 	 {https://proceedings.mlr.press/v9/wallach10a.html},
  abstract = 	 {Prior distributions play a crucial role in Bayesian approaches to clustering. Two commonly-used prior distributions are the Dirichlet and Pitman-Yor processes. In this paper, we investigate the predictive probabilities that underlie these processes, and the implicit “rich-get-richer” characteristic of the resulting partitions. We explore an alternative prior for nonparametric Bayesian clustering, the uniform process, for applications where the “rich-get-richer” property is undesirable. We also explore the cost of this new process: partitions are no longer exchangeable with respect to the ordering of variables. We present new asymptotic and simulation-based results for the clustering characteristics of the uniform process and compare these with known results for the Dirichlet and Pitman-Yor processes. Finally, we compare performance on a real document clustering task, demonstrating the practical advantage of the uniform process despite its lack of exchangeability over orderings.}
}

Endnote

%0 Conference Paper
%T An Alternative Prior Process for Nonparametric Bayesian Clustering
%A Hanna Wallach
%A Shane Jensen
%A Lee Dicker
%A Katherine Heller
%B Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2010
%E Yee Whye Teh
%E Mike Titterington	
%F pmlr-v9-wallach10a
%I PMLR
%P 892--899
%U https://proceedings.mlr.press/v9/wallach10a.html
%V 9
%X Prior distributions play a crucial role in Bayesian approaches to clustering. Two commonly-used prior distributions are the Dirichlet and Pitman-Yor processes. In this paper, we investigate the predictive probabilities that underlie these processes, and the implicit “rich-get-richer” characteristic of the resulting partitions. We explore an alternative prior for nonparametric Bayesian clustering, the uniform process, for applications where the “rich-get-richer” property is undesirable. We also explore the cost of this new process: partitions are no longer exchangeable with respect to the ordering of variables. We present new asymptotic and simulation-based results for the clustering characteristics of the uniform process and compare these with known results for the Dirichlet and Pitman-Yor processes. Finally, we compare performance on a real document clustering task, demonstrating the practical advantage of the uniform process despite its lack of exchangeability over orderings.

RIS


TY  - CPAPER
TI  - An Alternative Prior Process for Nonparametric Bayesian Clustering
AU  - Hanna Wallach
AU  - Shane Jensen
AU  - Lee Dicker
AU  - Katherine Heller
BT  - Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics
DA  - 2010/03/31
ED  - Yee Whye Teh
ED  - Mike Titterington	
ID  - pmlr-v9-wallach10a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 9
SP  - 892
EP  - 899
L1  - http://proceedings.mlr.press/v9/wallach10a/wallach10a.pdf
UR  - https://proceedings.mlr.press/v9/wallach10a.html
AB  - Prior distributions play a crucial role in Bayesian approaches to clustering. Two commonly-used prior distributions are the Dirichlet and Pitman-Yor processes. In this paper, we investigate the predictive probabilities that underlie these processes, and the implicit “rich-get-richer” characteristic of the resulting partitions. We explore an alternative prior for nonparametric Bayesian clustering, the uniform process, for applications where the “rich-get-richer” property is undesirable. We also explore the cost of this new process: partitions are no longer exchangeable with respect to the ordering of variables. We present new asymptotic and simulation-based results for the clustering characteristics of the uniform process and compare these with known results for the Dirichlet and Pitman-Yor processes. Finally, we compare performance on a real document clustering task, demonstrating the practical advantage of the uniform process despite its lack of exchangeability over orderings.
ER  -

APA


Wallach, H., Jensen, S., Dicker, L. & Heller, K.. (2010). An Alternative Prior Process for Nonparametric Bayesian Clustering. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 9:892-899 Available from https://proceedings.mlr.press/v9/wallach10a.html.

An Alternative Prior Process for Nonparametric Bayesian Clustering

Abstract

Cite this Paper

Related Material