A Nonparametric Bayesian Model for Multiple Clustering with Overlapping Feature Views

Donglin Niu, Jennifer Dy, Zoubin Ghahramani
; Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, PMLR 22:814-822, 2012.

Abstract

Most clustering algorithms produce a single clustering solution. This is inadequate for many data sets that are multi-faceted and can be grouped and interpreted in many different ways. Moreover, for high-dimensional data, different features may be relevant or irrelevant to each clustering solution, suggesting the need for feature selection in clustering. Features relevant to one clustering interpretation may be different from the ones relevant for an alternative interpretation or view of the data. In this paper, we introduce a probabilistic nonparametric Bayesian model that can discover multiple clustering solutions from data and the feature subsets that are relevant for the clusters in each view. In our model, the features in different views may be shared and therefore the sets of relevant features are allowed to overlap. We model feature relevance to each view using an Indian Buffet Process and the cluster membership in each view using a Chinese Restaurant Process. We provide an inference approach to learn the latent parameters corresponding to this multiple partitioning problem. Our model not only learns the features and clusters in each view but also automatically learns the number of clusters, number of views and number of features in each view.

Cite this Paper


BibTeX
@InProceedings{pmlr-v22-niu12, title = {A Nonparametric Bayesian Model for Multiple Clustering with Overlapping Feature Views}, author = {Donglin Niu and Jennifer Dy and Zoubin Ghahramani}, booktitle = {Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics}, pages = {814--822}, year = {2012}, editor = {Neil D. Lawrence and Mark Girolami}, volume = {22}, series = {Proceedings of Machine Learning Research}, address = {La Palma, Canary Islands}, month = {21--23 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v22/niu12/niu12.pdf}, url = {http://proceedings.mlr.press/v22/niu12.html}, abstract = {Most clustering algorithms produce a single clustering solution. This is inadequate for many data sets that are multi-faceted and can be grouped and interpreted in many different ways. Moreover, for high-dimensional data, different features may be relevant or irrelevant to each clustering solution, suggesting the need for feature selection in clustering. Features relevant to one clustering interpretation may be different from the ones relevant for an alternative interpretation or view of the data. In this paper, we introduce a probabilistic nonparametric Bayesian model that can discover multiple clustering solutions from data and the feature subsets that are relevant for the clusters in each view. In our model, the features in different views may be shared and therefore the sets of relevant features are allowed to overlap. We model feature relevance to each view using an Indian Buffet Process and the cluster membership in each view using a Chinese Restaurant Process. We provide an inference approach to learn the latent parameters corresponding to this multiple partitioning problem. Our model not only learns the features and clusters in each view but also automatically learns the number of clusters, number of views and number of features in each view.} }
Endnote
%0 Conference Paper %T A Nonparametric Bayesian Model for Multiple Clustering with Overlapping Feature Views %A Donglin Niu %A Jennifer Dy %A Zoubin Ghahramani %B Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2012 %E Neil D. Lawrence %E Mark Girolami %F pmlr-v22-niu12 %I PMLR %J Proceedings of Machine Learning Research %P 814--822 %U http://proceedings.mlr.press %V 22 %W PMLR %X Most clustering algorithms produce a single clustering solution. This is inadequate for many data sets that are multi-faceted and can be grouped and interpreted in many different ways. Moreover, for high-dimensional data, different features may be relevant or irrelevant to each clustering solution, suggesting the need for feature selection in clustering. Features relevant to one clustering interpretation may be different from the ones relevant for an alternative interpretation or view of the data. In this paper, we introduce a probabilistic nonparametric Bayesian model that can discover multiple clustering solutions from data and the feature subsets that are relevant for the clusters in each view. In our model, the features in different views may be shared and therefore the sets of relevant features are allowed to overlap. We model feature relevance to each view using an Indian Buffet Process and the cluster membership in each view using a Chinese Restaurant Process. We provide an inference approach to learn the latent parameters corresponding to this multiple partitioning problem. Our model not only learns the features and clusters in each view but also automatically learns the number of clusters, number of views and number of features in each view.
RIS
TY - CPAPER TI - A Nonparametric Bayesian Model for Multiple Clustering with Overlapping Feature Views AU - Donglin Niu AU - Jennifer Dy AU - Zoubin Ghahramani BT - Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics PY - 2012/03/21 DA - 2012/03/21 ED - Neil D. Lawrence ED - Mark Girolami ID - pmlr-v22-niu12 PB - PMLR SP - 814 DP - PMLR EP - 822 L1 - http://proceedings.mlr.press/v22/niu12/niu12.pdf UR - http://proceedings.mlr.press/v22/niu12.html AB - Most clustering algorithms produce a single clustering solution. This is inadequate for many data sets that are multi-faceted and can be grouped and interpreted in many different ways. Moreover, for high-dimensional data, different features may be relevant or irrelevant to each clustering solution, suggesting the need for feature selection in clustering. Features relevant to one clustering interpretation may be different from the ones relevant for an alternative interpretation or view of the data. In this paper, we introduce a probabilistic nonparametric Bayesian model that can discover multiple clustering solutions from data and the feature subsets that are relevant for the clusters in each view. In our model, the features in different views may be shared and therefore the sets of relevant features are allowed to overlap. We model feature relevance to each view using an Indian Buffet Process and the cluster membership in each view using a Chinese Restaurant Process. We provide an inference approach to learn the latent parameters corresponding to this multiple partitioning problem. Our model not only learns the features and clusters in each view but also automatically learns the number of clusters, number of views and number of features in each view. ER -
APA
Niu, D., Dy, J. & Ghahramani, Z.. (2012). A Nonparametric Bayesian Model for Multiple Clustering with Overlapping Feature Views. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, in PMLR 22:814-822

Related Material