A Nonparametric Bayesian Model for Multiple Clustering with Overlapping Feature Views

Donglin Niu; Jennifer Dy; Zoubin Ghahramani

A Nonparametric Bayesian Model for Multiple Clustering with Overlapping Feature Views

Donglin Niu, Jennifer Dy, Zoubin Ghahramani

Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, PMLR 22:814-822, 2012.

Abstract

Most clustering algorithms produce a single clustering solution. This is inadequate for many data sets that are multi-faceted and can be grouped and interpreted in many different ways. Moreover, for high-dimensional data, different features may be relevant or irrelevant to each clustering solution, suggesting the need for feature selection in clustering. Features relevant to one clustering interpretation may be different from the ones relevant for an alternative interpretation or view of the data. In this paper, we introduce a probabilistic nonparametric Bayesian model that can discover multiple clustering solutions from data and the feature subsets that are relevant for the clusters in each view. In our model, the features in different views may be shared and therefore the sets of relevant features are allowed to overlap. We model feature relevance to each view using an Indian Buffet Process and the cluster membership in each view using a Chinese Restaurant Process. We provide an inference approach to learn the latent parameters corresponding to this multiple partitioning problem. Our model not only learns the features and clusters in each view but also automatically learns the number of clusters, number of views and number of features in each view.

Cite this Paper

BibTeX


@InProceedings{pmlr-v22-niu12,
  title = 	 {A Nonparametric Bayesian Model for Multiple Clustering with Overlapping Feature Views},
  author = 	 {Niu, Donglin and Dy, Jennifer and Ghahramani, Zoubin},
  booktitle = 	 {Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {814--822},
  year = 	 {2012},
  editor = 	 {Lawrence, Neil D. and Girolami, Mark},
  volume = 	 {22},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {La Palma, Canary Islands},
  month = 	 {21--23 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v22/niu12/niu12.pdf},
  url = 	 {https://proceedings.mlr.press/v22/niu12.html},
  abstract = 	 {Most clustering algorithms produce a single clustering solution. This is inadequate for many data sets that are multi-faceted and can be grouped and interpreted in many different ways.  Moreover, for high-dimensional data, different features may be relevant or irrelevant to each clustering solution, suggesting the need for feature selection in clustering. Features relevant to one clustering interpretation may be different from the ones relevant for an alternative interpretation or view of the data.  In this paper, we introduce a probabilistic nonparametric Bayesian model that can discover multiple clustering solutions from data and the feature subsets that are relevant for the clusters in each view. In our model, the features in different views may be shared and therefore the sets of relevant features are allowed to overlap.  We model feature relevance to each view using an Indian Buffet Process and the cluster membership in each view using a Chinese Restaurant Process.  We provide an inference approach to learn the latent parameters corresponding to this multiple partitioning problem.  Our model not only learns the features and clusters in each view but also automatically learns the number of clusters, number of views and number of features in each view.}
}

Endnote

%0 Conference Paper
%T A Nonparametric Bayesian Model for Multiple Clustering with Overlapping Feature Views
%A Donglin Niu
%A Jennifer Dy
%A Zoubin Ghahramani
%B Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2012
%E Neil D. Lawrence
%E Mark Girolami	
%F pmlr-v22-niu12
%I PMLR
%P 814--822
%U https://proceedings.mlr.press/v22/niu12.html
%V 22
%X Most clustering algorithms produce a single clustering solution. This is inadequate for many data sets that are multi-faceted and can be grouped and interpreted in many different ways.  Moreover, for high-dimensional data, different features may be relevant or irrelevant to each clustering solution, suggesting the need for feature selection in clustering. Features relevant to one clustering interpretation may be different from the ones relevant for an alternative interpretation or view of the data.  In this paper, we introduce a probabilistic nonparametric Bayesian model that can discover multiple clustering solutions from data and the feature subsets that are relevant for the clusters in each view. In our model, the features in different views may be shared and therefore the sets of relevant features are allowed to overlap.  We model feature relevance to each view using an Indian Buffet Process and the cluster membership in each view using a Chinese Restaurant Process.  We provide an inference approach to learn the latent parameters corresponding to this multiple partitioning problem.  Our model not only learns the features and clusters in each view but also automatically learns the number of clusters, number of views and number of features in each view.

RIS


TY  - CPAPER
TI  - A Nonparametric Bayesian Model for Multiple Clustering with Overlapping Feature Views
AU  - Donglin Niu
AU  - Jennifer Dy
AU  - Zoubin Ghahramani
BT  - Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics
DA  - 2012/03/21
ED  - Neil D. Lawrence
ED  - Mark Girolami	
ID  - pmlr-v22-niu12
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 22
SP  - 814
EP  - 822
L1  - http://proceedings.mlr.press/v22/niu12/niu12.pdf
UR  - https://proceedings.mlr.press/v22/niu12.html
AB  - Most clustering algorithms produce a single clustering solution. This is inadequate for many data sets that are multi-faceted and can be grouped and interpreted in many different ways.  Moreover, for high-dimensional data, different features may be relevant or irrelevant to each clustering solution, suggesting the need for feature selection in clustering. Features relevant to one clustering interpretation may be different from the ones relevant for an alternative interpretation or view of the data.  In this paper, we introduce a probabilistic nonparametric Bayesian model that can discover multiple clustering solutions from data and the feature subsets that are relevant for the clusters in each view. In our model, the features in different views may be shared and therefore the sets of relevant features are allowed to overlap.  We model feature relevance to each view using an Indian Buffet Process and the cluster membership in each view using a Chinese Restaurant Process.  We provide an inference approach to learn the latent parameters corresponding to this multiple partitioning problem.  Our model not only learns the features and clusters in each view but also automatically learns the number of clusters, number of views and number of features in each view.
ER  -

APA


Niu, D., Dy, J. & Ghahramani, Z.. (2012). A Nonparametric Bayesian Model for Multiple Clustering with Overlapping Feature Views. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 22:814-822 Available from https://proceedings.mlr.press/v22/niu12.html.

Related Material

Download PDF