K-Means Clustering with Distributed Dimensions

Hu Ding; Yu Liu; Lingxiao Huang; Jian Li

K-Means Clustering with Distributed Dimensions

Hu Ding, Yu Liu, Lingxiao Huang, Jian Li

Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:1339-1348, 2016.

Abstract

Distributed clustering has attracted significant attention in recent years. In this paper, we study the k-means problem in the distributed dimension setting, where the dimensions of the data are partitioned across multiple machines. We provide new approximation algorithms, which incur low communication costs and achieve constant approximation ratios. The communication complexity of our algorithms significantly improve on existing algorithms. We also provide the first communication lower bound, which nearly matches our upper bound in a certain range of parameter setting. Our experimental results show that our algorithms outperform existing algorithms on real data-sets in the distributed dimension setting.

Cite this Paper

BibTeX


@InProceedings{pmlr-v48-ding16,
  title = 	 {$K$-Means Clustering with Distributed Dimensions},
  author = 	 {Ding, Hu and Liu, Yu and Huang, Lingxiao and Li, Jian},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {1339--1348},
  year = 	 {2016},
  editor = 	 {Balcan, Maria Florina and Weinberger, Kilian Q.},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {20--22 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v48/ding16.pdf},
  url = 	 {https://proceedings.mlr.press/v48/ding16.html},
  abstract = 	 {Distributed clustering has attracted significant attention in recent years. In this paper, we study the k-means problem in the distributed dimension setting, where the dimensions of the data are partitioned across multiple machines. We provide new approximation algorithms, which incur low communication costs and achieve constant approximation ratios. The communication complexity of our algorithms significantly improve on existing algorithms. We also provide the first communication lower bound, which nearly matches our upper bound in a certain range of parameter setting. Our experimental results show that our algorithms outperform existing algorithms on real data-sets in the distributed dimension setting.}
}

Endnote

%0 Conference Paper
%T K-Means Clustering with Distributed Dimensions
%A Hu Ding
%A Yu Liu
%A Lingxiao Huang
%A Jian Li
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger	
%F pmlr-v48-ding16
%I PMLR
%P 1339--1348
%U https://proceedings.mlr.press/v48/ding16.html
%V 48
%X Distributed clustering has attracted significant attention in recent years. In this paper, we study the k-means problem in the distributed dimension setting, where the dimensions of the data are partitioned across multiple machines. We provide new approximation algorithms, which incur low communication costs and achieve constant approximation ratios. The communication complexity of our algorithms significantly improve on existing algorithms. We also provide the first communication lower bound, which nearly matches our upper bound in a certain range of parameter setting. Our experimental results show that our algorithms outperform existing algorithms on real data-sets in the distributed dimension setting.

RIS


TY  - CPAPER
TI  - K-Means Clustering with Distributed Dimensions
AU  - Hu Ding
AU  - Yu Liu
AU  - Lingxiao Huang
AU  - Jian Li
BT  - Proceedings of The 33rd International Conference on Machine Learning
DA  - 2016/06/11
ED  - Maria Florina Balcan
ED  - Kilian Q. Weinberger	
ID  - pmlr-v48-ding16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 48
SP  - 1339
EP  - 1348
L1  - http://proceedings.mlr.press/v48/ding16.pdf
UR  - https://proceedings.mlr.press/v48/ding16.html
AB  - Distributed clustering has attracted significant attention in recent years. In this paper, we study the k-means problem in the distributed dimension setting, where the dimensions of the data are partitioned across multiple machines. We provide new approximation algorithms, which incur low communication costs and achieve constant approximation ratios. The communication complexity of our algorithms significantly improve on existing algorithms. We also provide the first communication lower bound, which nearly matches our upper bound in a certain range of parameter setting. Our experimental results show that our algorithms outperform existing algorithms on real data-sets in the distributed dimension setting.
ER  -

APA


Ding, H., Liu, Y., Huang, L. & Li, J.. (2016). K-Means Clustering with Distributed Dimensions. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:1339-1348 Available from https://proceedings.mlr.press/v48/ding16.html.

K-Means Clustering with Distributed Dimensions

Abstract

Cite this Paper

Related Material