Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup

Yufei Ding, Yue Zhao, Xipeng Shen, Madanlal Musuvathi, Todd Mytkowicz
Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:579-587, 2015.

Abstract

This paper presents Yinyang K-means, a new algorithm for K-means clustering. By clustering the centers in the initial stage, and leveraging efficiently maintained lower and upper bounds between a point and centers, it more effectively avoids unnecessary distance calculations than prior algorithms. It significantly outperforms classic K-means and prior alternative K-means algorithms consistently across all experimented data sets, cluster numbers, and machine configurations. The consistent, superior performance—plus its simplicity, user-control of overheads, and guarantee in producing the same clustering results as the standard K-means does—makes Yinyang K-means a drop-in replacement of the classic K-means with an order of magnitude higher performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v37-ding15, title = {Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup}, author = {Ding, Yufei and Zhao, Yue and Shen, Xipeng and Musuvathi, Madanlal and Mytkowicz, Todd}, booktitle = {Proceedings of the 32nd International Conference on Machine Learning}, pages = {579--587}, year = {2015}, editor = {Bach, Francis and Blei, David}, volume = {37}, series = {Proceedings of Machine Learning Research}, address = {Lille, France}, month = {07--09 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v37/ding15.pdf}, url = { http://proceedings.mlr.press/v37/ding15.html }, abstract = {This paper presents Yinyang K-means, a new algorithm for K-means clustering. By clustering the centers in the initial stage, and leveraging efficiently maintained lower and upper bounds between a point and centers, it more effectively avoids unnecessary distance calculations than prior algorithms. It significantly outperforms classic K-means and prior alternative K-means algorithms consistently across all experimented data sets, cluster numbers, and machine configurations. The consistent, superior performance—plus its simplicity, user-control of overheads, and guarantee in producing the same clustering results as the standard K-means does—makes Yinyang K-means a drop-in replacement of the classic K-means with an order of magnitude higher performance.} }
Endnote
%0 Conference Paper %T Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup %A Yufei Ding %A Yue Zhao %A Xipeng Shen %A Madanlal Musuvathi %A Todd Mytkowicz %B Proceedings of the 32nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2015 %E Francis Bach %E David Blei %F pmlr-v37-ding15 %I PMLR %P 579--587 %U http://proceedings.mlr.press/v37/ding15.html %V 37 %X This paper presents Yinyang K-means, a new algorithm for K-means clustering. By clustering the centers in the initial stage, and leveraging efficiently maintained lower and upper bounds between a point and centers, it more effectively avoids unnecessary distance calculations than prior algorithms. It significantly outperforms classic K-means and prior alternative K-means algorithms consistently across all experimented data sets, cluster numbers, and machine configurations. The consistent, superior performance—plus its simplicity, user-control of overheads, and guarantee in producing the same clustering results as the standard K-means does—makes Yinyang K-means a drop-in replacement of the classic K-means with an order of magnitude higher performance.
RIS
TY - CPAPER TI - Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup AU - Yufei Ding AU - Yue Zhao AU - Xipeng Shen AU - Madanlal Musuvathi AU - Todd Mytkowicz BT - Proceedings of the 32nd International Conference on Machine Learning DA - 2015/06/01 ED - Francis Bach ED - David Blei ID - pmlr-v37-ding15 PB - PMLR DP - Proceedings of Machine Learning Research VL - 37 SP - 579 EP - 587 L1 - http://proceedings.mlr.press/v37/ding15.pdf UR - http://proceedings.mlr.press/v37/ding15.html AB - This paper presents Yinyang K-means, a new algorithm for K-means clustering. By clustering the centers in the initial stage, and leveraging efficiently maintained lower and upper bounds between a point and centers, it more effectively avoids unnecessary distance calculations than prior algorithms. It significantly outperforms classic K-means and prior alternative K-means algorithms consistently across all experimented data sets, cluster numbers, and machine configurations. The consistent, superior performance—plus its simplicity, user-control of overheads, and guarantee in producing the same clustering results as the standard K-means does—makes Yinyang K-means a drop-in replacement of the classic K-means with an order of magnitude higher performance. ER -
APA
Ding, Y., Zhao, Y., Shen, X., Musuvathi, M. & Mytkowicz, T.. (2015). Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:579-587 Available from http://proceedings.mlr.press/v37/ding15.html .

Related Material