Efficient Sparse Clustering of High-Dimensional Non-spherical Gaussian Mixtures

Martin Azizyan, Aarti Singh, Larry Wasserman
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, PMLR 38:37-45, 2015.

Abstract

We consider the problem of clustering data points in high dimensions, i.e., when the number of data points may be much smaller than the number of dimensions. Specifically, we consider a Gaussian mixture model (GMM) with two non-spherical Gaussian components, where the clusters are distinguished by only a few relevant dimensions. The method we propose is a combination of a recent approach for learning parameters of a Gaussian mixture model and sparse linear discriminant analysis (LDA). In addition to cluster assignments, the method returns an estimate of the set of features relevant for clustering. Our results indicate that the sample complexity of clustering depends on the sparsity of the relevant feature set, while only scaling logarithmically with the ambient dimension. Further, we require much milder assumptions than existing work on clustering in high dimensions. In particular, we do not require spherical clusters nor necessitate mean separation along relevant dimensions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v38-azizyan15, title = {{Efficient Sparse Clustering of High-Dimensional Non-spherical Gaussian Mixtures}}, author = {Azizyan, Martin and Singh, Aarti and Wasserman, Larry}, booktitle = {Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics}, pages = {37--45}, year = {2015}, editor = {Lebanon, Guy and Vishwanathan, S. V. N.}, volume = {38}, series = {Proceedings of Machine Learning Research}, address = {San Diego, California, USA}, month = {09--12 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v38/azizyan15.pdf}, url = {https://proceedings.mlr.press/v38/azizyan15.html}, abstract = {We consider the problem of clustering data points in high dimensions, i.e., when the number of data points may be much smaller than the number of dimensions. Specifically, we consider a Gaussian mixture model (GMM) with two non-spherical Gaussian components, where the clusters are distinguished by only a few relevant dimensions. The method we propose is a combination of a recent approach for learning parameters of a Gaussian mixture model and sparse linear discriminant analysis (LDA). In addition to cluster assignments, the method returns an estimate of the set of features relevant for clustering. Our results indicate that the sample complexity of clustering depends on the sparsity of the relevant feature set, while only scaling logarithmically with the ambient dimension. Further, we require much milder assumptions than existing work on clustering in high dimensions. In particular, we do not require spherical clusters nor necessitate mean separation along relevant dimensions.} }
Endnote
%0 Conference Paper %T Efficient Sparse Clustering of High-Dimensional Non-spherical Gaussian Mixtures %A Martin Azizyan %A Aarti Singh %A Larry Wasserman %B Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2015 %E Guy Lebanon %E S. V. N. Vishwanathan %F pmlr-v38-azizyan15 %I PMLR %P 37--45 %U https://proceedings.mlr.press/v38/azizyan15.html %V 38 %X We consider the problem of clustering data points in high dimensions, i.e., when the number of data points may be much smaller than the number of dimensions. Specifically, we consider a Gaussian mixture model (GMM) with two non-spherical Gaussian components, where the clusters are distinguished by only a few relevant dimensions. The method we propose is a combination of a recent approach for learning parameters of a Gaussian mixture model and sparse linear discriminant analysis (LDA). In addition to cluster assignments, the method returns an estimate of the set of features relevant for clustering. Our results indicate that the sample complexity of clustering depends on the sparsity of the relevant feature set, while only scaling logarithmically with the ambient dimension. Further, we require much milder assumptions than existing work on clustering in high dimensions. In particular, we do not require spherical clusters nor necessitate mean separation along relevant dimensions.
RIS
TY - CPAPER TI - Efficient Sparse Clustering of High-Dimensional Non-spherical Gaussian Mixtures AU - Martin Azizyan AU - Aarti Singh AU - Larry Wasserman BT - Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics DA - 2015/02/21 ED - Guy Lebanon ED - S. V. N. Vishwanathan ID - pmlr-v38-azizyan15 PB - PMLR DP - Proceedings of Machine Learning Research VL - 38 SP - 37 EP - 45 L1 - http://proceedings.mlr.press/v38/azizyan15.pdf UR - https://proceedings.mlr.press/v38/azizyan15.html AB - We consider the problem of clustering data points in high dimensions, i.e., when the number of data points may be much smaller than the number of dimensions. Specifically, we consider a Gaussian mixture model (GMM) with two non-spherical Gaussian components, where the clusters are distinguished by only a few relevant dimensions. The method we propose is a combination of a recent approach for learning parameters of a Gaussian mixture model and sparse linear discriminant analysis (LDA). In addition to cluster assignments, the method returns an estimate of the set of features relevant for clustering. Our results indicate that the sample complexity of clustering depends on the sparsity of the relevant feature set, while only scaling logarithmically with the ambient dimension. Further, we require much milder assumptions than existing work on clustering in high dimensions. In particular, we do not require spherical clusters nor necessitate mean separation along relevant dimensions. ER -
APA
Azizyan, M., Singh, A. & Wasserman, L.. (2015). Efficient Sparse Clustering of High-Dimensional Non-spherical Gaussian Mixtures. Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 38:37-45 Available from https://proceedings.mlr.press/v38/azizyan15.html.

Related Material