The Complexity of k-Means Clustering when Little is Known

Robert Ganian, Thekla Hamm, Viktoriia Korchemna, Karolina Okrasa, Kirill Simonov
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:6960-6987, 2022.

Abstract

In the area of data analysis and arguably even in machine learning as a whole, few approaches have been as impactful as the classical k-means clustering. Here, we study the complexity of k-means clustering in settings where most of the data is not known or simply irrelevant. To obtain a more fine-grained understanding of the tractability of this clustering problem, we apply the parameterized complexity paradigm and obtain three new algorithms for k-means clustering of incomplete data: one for the clustering of bounded-domain (i.e., integer) data, and two incomparable algorithms that target real-valued data. Our approach is based on exploiting structural properties of a graphical encoding of the missing entries, and we show that tractability can be achieved using significantly less restrictive parameterizations than in the complementary case of few missing entries.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-ganian22a, title = {The Complexity of k-Means Clustering when Little is Known}, author = {Ganian, Robert and Hamm, Thekla and Korchemna, Viktoriia and Okrasa, Karolina and Simonov, Kirill}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {6960--6987}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/ganian22a/ganian22a.pdf}, url = {https://proceedings.mlr.press/v162/ganian22a.html}, abstract = {In the area of data analysis and arguably even in machine learning as a whole, few approaches have been as impactful as the classical k-means clustering. Here, we study the complexity of k-means clustering in settings where most of the data is not known or simply irrelevant. To obtain a more fine-grained understanding of the tractability of this clustering problem, we apply the parameterized complexity paradigm and obtain three new algorithms for k-means clustering of incomplete data: one for the clustering of bounded-domain (i.e., integer) data, and two incomparable algorithms that target real-valued data. Our approach is based on exploiting structural properties of a graphical encoding of the missing entries, and we show that tractability can be achieved using significantly less restrictive parameterizations than in the complementary case of few missing entries.} }
Endnote
%0 Conference Paper %T The Complexity of k-Means Clustering when Little is Known %A Robert Ganian %A Thekla Hamm %A Viktoriia Korchemna %A Karolina Okrasa %A Kirill Simonov %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-ganian22a %I PMLR %P 6960--6987 %U https://proceedings.mlr.press/v162/ganian22a.html %V 162 %X In the area of data analysis and arguably even in machine learning as a whole, few approaches have been as impactful as the classical k-means clustering. Here, we study the complexity of k-means clustering in settings where most of the data is not known or simply irrelevant. To obtain a more fine-grained understanding of the tractability of this clustering problem, we apply the parameterized complexity paradigm and obtain three new algorithms for k-means clustering of incomplete data: one for the clustering of bounded-domain (i.e., integer) data, and two incomparable algorithms that target real-valued data. Our approach is based on exploiting structural properties of a graphical encoding of the missing entries, and we show that tractability can be achieved using significantly less restrictive parameterizations than in the complementary case of few missing entries.
APA
Ganian, R., Hamm, T., Korchemna, V., Okrasa, K. & Simonov, K.. (2022). The Complexity of k-Means Clustering when Little is Known. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:6960-6987 Available from https://proceedings.mlr.press/v162/ganian22a.html.

Related Material