A Single-Pass Algorithm for Efficiently Recovering Sparse Cluster Centers of High-dimensional Data

Jinfeng Yi; Lijun Zhang; Jun Wang; Rong Jin; Anil Jain

A Single-Pass Algorithm for Efficiently Recovering Sparse Cluster Centers of High-dimensional Data

Jinfeng Yi, Lijun Zhang, Jun Wang, Rong Jin, Anil Jain

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):658-666, 2014.

Abstract

Learning a statistical model for high-dimensional data is an important topic in machine learning. Although this problem has been well studied in the supervised setting, little is known about its unsupervised counterpart. In this work, we focus on the problem of clustering high-dimensional data with sparse centers. In particular, we address the following open question in unsupervised learning: “is it possible to reliably cluster high-dimensional data when the number of samples is smaller than the data dimensionality?" We develop an efficient clustering algorithm that is able to estimate sparse cluster centers with a single pass over the data. Our theoretical analysis shows that the proposed algorithm is able to accurately recover cluster centers with only O(s\log d) number of samples (data points), provided all the cluster centers are s-sparse vectors in a d dimensional space. Experimental results verify both the effectiveness and efficiency of the proposed clustering algorithm compared to the state-of-the-art algorithms on several benchmark datasets.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-yib14,
  title = 	 {A Single-Pass Algorithm for Efficiently Recovering Sparse Cluster Centers of High-dimensional Data},
  author = 	 {Yi, Jinfeng and Zhang, Lijun and Wang, Jun and Jin, Rong and Jain, Anil},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {658--666},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {2},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/yib14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/yib14.html},
  abstract = 	 {Learning a statistical model for high-dimensional data is an important topic in machine learning. Although this problem has been well studied in the supervised setting, little is known about its unsupervised counterpart. In this work, we focus on the problem of clustering high-dimensional data with sparse centers.  In particular, we address the following open question in unsupervised learning: “is it possible to reliably cluster high-dimensional data when the number of samples is smaller than the data dimensionality?" We develop an efficient clustering algorithm that is able to estimate sparse cluster centers with a single pass over the data. Our theoretical analysis shows that the proposed algorithm is able to accurately recover cluster centers with only O(s\log d) number of samples (data points), provided all the cluster centers are s-sparse vectors in a d dimensional space. Experimental results verify both the effectiveness and efficiency of the proposed clustering algorithm compared to the state-of-the-art algorithms on several benchmark datasets.}
}

Endnote

%0 Conference Paper
%T A Single-Pass Algorithm for Efficiently Recovering Sparse Cluster Centers of High-dimensional Data
%A Jinfeng Yi
%A Lijun Zhang
%A Jun Wang
%A Rong Jin
%A Anil Jain
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-yib14
%I PMLR
%P 658--666
%U https://proceedings.mlr.press/v32/yib14.html
%V 32
%N 2
%X Learning a statistical model for high-dimensional data is an important topic in machine learning. Although this problem has been well studied in the supervised setting, little is known about its unsupervised counterpart. In this work, we focus on the problem of clustering high-dimensional data with sparse centers.  In particular, we address the following open question in unsupervised learning: “is it possible to reliably cluster high-dimensional data when the number of samples is smaller than the data dimensionality?" We develop an efficient clustering algorithm that is able to estimate sparse cluster centers with a single pass over the data. Our theoretical analysis shows that the proposed algorithm is able to accurately recover cluster centers with only O(s\log d) number of samples (data points), provided all the cluster centers are s-sparse vectors in a d dimensional space. Experimental results verify both the effectiveness and efficiency of the proposed clustering algorithm compared to the state-of-the-art algorithms on several benchmark datasets.

RIS


TY  - CPAPER
TI  - A Single-Pass Algorithm for Efficiently Recovering Sparse Cluster Centers of High-dimensional Data
AU  - Jinfeng Yi
AU  - Lijun Zhang
AU  - Jun Wang
AU  - Rong Jin
AU  - Anil Jain
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/06/18
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-yib14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 2
SP  - 658
EP  - 666
L1  - http://proceedings.mlr.press/v32/yib14.pdf
UR  - https://proceedings.mlr.press/v32/yib14.html
AB  - Learning a statistical model for high-dimensional data is an important topic in machine learning. Although this problem has been well studied in the supervised setting, little is known about its unsupervised counterpart. In this work, we focus on the problem of clustering high-dimensional data with sparse centers.  In particular, we address the following open question in unsupervised learning: “is it possible to reliably cluster high-dimensional data when the number of samples is smaller than the data dimensionality?" We develop an efficient clustering algorithm that is able to estimate sparse cluster centers with a single pass over the data. Our theoretical analysis shows that the proposed algorithm is able to accurately recover cluster centers with only O(s\log d) number of samples (data points), provided all the cluster centers are s-sparse vectors in a d dimensional space. Experimental results verify both the effectiveness and efficiency of the proposed clustering algorithm compared to the state-of-the-art algorithms on several benchmark datasets.
ER  -

APA


Yi, J., Zhang, L., Wang, J., Jin, R. & Jain, A.. (2014). A Single-Pass Algorithm for Efficiently Recovering Sparse Cluster Centers of High-dimensional Data. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):658-666 Available from https://proceedings.mlr.press/v32/yib14.html.

A Single-Pass Algorithm for Efficiently Recovering Sparse Cluster Centers of High-dimensional Data

Abstract

Cite this Paper

Related Material