Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup

Yufei Ding; Yue Zhao; Xipeng Shen; Madanlal Musuvathi; Todd Mytkowicz

Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup

Yufei Ding, Yue Zhao, Xipeng Shen, Madanlal Musuvathi, Todd Mytkowicz

Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:579-587, 2015.

Abstract

This paper presents Yinyang K-means, a new algorithm for K-means clustering. By clustering the centers in the initial stage, and leveraging efficiently maintained lower and upper bounds between a point and centers, it more effectively avoids unnecessary distance calculations than prior algorithms. It significantly outperforms classic K-means and prior alternative K-means algorithms consistently across all experimented data sets, cluster numbers, and machine configurations. The consistent, superior performance—plus its simplicity, user-control of overheads, and guarantee in producing the same clustering results as the standard K-means does—makes Yinyang K-means a drop-in replacement of the classic K-means with an order of magnitude higher performance.

Cite this Paper

BibTeX


@InProceedings{pmlr-v37-ding15,
  title = 	 {Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup},
  author = 	 {Ding, Yufei and Zhao, Yue and Shen, Xipeng and Musuvathi, Madanlal and Mytkowicz, Todd},
  booktitle = 	 {Proceedings of the 32nd International Conference on Machine Learning},
  pages = 	 {579--587},
  year = 	 {2015},
  editor = 	 {Bach, Francis and Blei, David},
  volume = 	 {37},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Lille, France},
  month = 	 {07--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v37/ding15.pdf},
  url = 	 {https://proceedings.mlr.press/v37/ding15.html},
  abstract = 	 {This paper presents Yinyang K-means, a new algorithm for K-means clustering. By clustering the centers in the initial stage, and leveraging efficiently maintained lower and upper bounds between a point and centers, it more effectively avoids unnecessary distance calculations than prior algorithms. It significantly outperforms classic K-means and prior alternative K-means algorithms consistently across all experimented data sets, cluster numbers, and machine configurations. The consistent, superior performance—plus its simplicity, user-control of overheads, and guarantee in producing the same clustering results as the standard K-means does—makes Yinyang K-means a drop-in replacement of the classic K-means with an order of magnitude higher performance.}
}

Endnote

%0 Conference Paper
%T Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup
%A Yufei Ding
%A Yue Zhao
%A Xipeng Shen
%A Madanlal Musuvathi
%A Todd Mytkowicz
%B Proceedings of the 32nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Francis Bach
%E David Blei	
%F pmlr-v37-ding15
%I PMLR
%P 579--587
%U https://proceedings.mlr.press/v37/ding15.html
%V 37
%X This paper presents Yinyang K-means, a new algorithm for K-means clustering. By clustering the centers in the initial stage, and leveraging efficiently maintained lower and upper bounds between a point and centers, it more effectively avoids unnecessary distance calculations than prior algorithms. It significantly outperforms classic K-means and prior alternative K-means algorithms consistently across all experimented data sets, cluster numbers, and machine configurations. The consistent, superior performance—plus its simplicity, user-control of overheads, and guarantee in producing the same clustering results as the standard K-means does—makes Yinyang K-means a drop-in replacement of the classic K-means with an order of magnitude higher performance.

RIS


TY  - CPAPER
TI  - Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup
AU  - Yufei Ding
AU  - Yue Zhao
AU  - Xipeng Shen
AU  - Madanlal Musuvathi
AU  - Todd Mytkowicz
BT  - Proceedings of the 32nd International Conference on Machine Learning
DA  - 2015/06/01
ED  - Francis Bach
ED  - David Blei	
ID  - pmlr-v37-ding15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 37
SP  - 579
EP  - 587
L1  - http://proceedings.mlr.press/v37/ding15.pdf
UR  - https://proceedings.mlr.press/v37/ding15.html
AB  - This paper presents Yinyang K-means, a new algorithm for K-means clustering. By clustering the centers in the initial stage, and leveraging efficiently maintained lower and upper bounds between a point and centers, it more effectively avoids unnecessary distance calculations than prior algorithms. It significantly outperforms classic K-means and prior alternative K-means algorithms consistently across all experimented data sets, cluster numbers, and machine configurations. The consistent, superior performance—plus its simplicity, user-control of overheads, and guarantee in producing the same clustering results as the standard K-means does—makes Yinyang K-means a drop-in replacement of the classic K-means with an order of magnitude higher performance.
ER  -

APA


Ding, Y., Zhao, Y., Shen, X., Musuvathi, M. & Mytkowicz, T.. (2015). Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:579-587 Available from https://proceedings.mlr.press/v37/ding15.html.

Related Material

Download PDF