Large Scale K-Median Clustering for Stable Clustering Instances
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:2890-2898, 2021.
We study the problem of computing a good k-median clustering in a parallel computing environment. We design an efficient algorithm that gives a constant-factor approximation to the optimal solution for stable clustering instances. The notion of stability that we consider is resilience to perturbations of the distances between the points. Our computational experiments show that our algorithm works well in practice - we are able to find better clusterings than Lloyd’s algorithm and a centralized coreset construction using samples of the same size.