K-Means Clustering with Distributed Dimensions

Hu Ding, Yu Liu, Lingxiao Huang, Jian Li
Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:1339-1348, 2016.

Abstract

Distributed clustering has attracted significant attention in recent years. In this paper, we study the k-means problem in the distributed dimension setting, where the dimensions of the data are partitioned across multiple machines. We provide new approximation algorithms, which incur low communication costs and achieve constant approximation ratios. The communication complexity of our algorithms significantly improve on existing algorithms. We also provide the first communication lower bound, which nearly matches our upper bound in a certain range of parameter setting. Our experimental results show that our algorithms outperform existing algorithms on real data-sets in the distributed dimension setting.

Cite this Paper


BibTeX
@InProceedings{pmlr-v48-ding16, title = {$K$-Means Clustering with Distributed Dimensions}, author = {Ding, Hu and Liu, Yu and Huang, Lingxiao and Li, Jian}, booktitle = {Proceedings of The 33rd International Conference on Machine Learning}, pages = {1339--1348}, year = {2016}, editor = {Balcan, Maria Florina and Weinberger, Kilian Q.}, volume = {48}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {20--22 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v48/ding16.pdf}, url = {https://proceedings.mlr.press/v48/ding16.html}, abstract = {Distributed clustering has attracted significant attention in recent years. In this paper, we study the k-means problem in the distributed dimension setting, where the dimensions of the data are partitioned across multiple machines. We provide new approximation algorithms, which incur low communication costs and achieve constant approximation ratios. The communication complexity of our algorithms significantly improve on existing algorithms. We also provide the first communication lower bound, which nearly matches our upper bound in a certain range of parameter setting. Our experimental results show that our algorithms outperform existing algorithms on real data-sets in the distributed dimension setting.} }
Endnote
%0 Conference Paper %T K-Means Clustering with Distributed Dimensions %A Hu Ding %A Yu Liu %A Lingxiao Huang %A Jian Li %B Proceedings of The 33rd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2016 %E Maria Florina Balcan %E Kilian Q. Weinberger %F pmlr-v48-ding16 %I PMLR %P 1339--1348 %U https://proceedings.mlr.press/v48/ding16.html %V 48 %X Distributed clustering has attracted significant attention in recent years. In this paper, we study the k-means problem in the distributed dimension setting, where the dimensions of the data are partitioned across multiple machines. We provide new approximation algorithms, which incur low communication costs and achieve constant approximation ratios. The communication complexity of our algorithms significantly improve on existing algorithms. We also provide the first communication lower bound, which nearly matches our upper bound in a certain range of parameter setting. Our experimental results show that our algorithms outperform existing algorithms on real data-sets in the distributed dimension setting.
RIS
TY - CPAPER TI - K-Means Clustering with Distributed Dimensions AU - Hu Ding AU - Yu Liu AU - Lingxiao Huang AU - Jian Li BT - Proceedings of The 33rd International Conference on Machine Learning DA - 2016/06/11 ED - Maria Florina Balcan ED - Kilian Q. Weinberger ID - pmlr-v48-ding16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 48 SP - 1339 EP - 1348 L1 - http://proceedings.mlr.press/v48/ding16.pdf UR - https://proceedings.mlr.press/v48/ding16.html AB - Distributed clustering has attracted significant attention in recent years. In this paper, we study the k-means problem in the distributed dimension setting, where the dimensions of the data are partitioned across multiple machines. We provide new approximation algorithms, which incur low communication costs and achieve constant approximation ratios. The communication complexity of our algorithms significantly improve on existing algorithms. We also provide the first communication lower bound, which nearly matches our upper bound in a certain range of parameter setting. Our experimental results show that our algorithms outperform existing algorithms on real data-sets in the distributed dimension setting. ER -
APA
Ding, H., Liu, Y., Huang, L. & Li, J.. (2016). K-Means Clustering with Distributed Dimensions. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:1339-1348 Available from https://proceedings.mlr.press/v48/ding16.html.

Related Material