Clustering in the Presence of Background Noise


Shai Ben-David, Nika Haghtalab ;
Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):280-288, 2014.


We address the problem of noise management in clustering algorithms. Namely, issues that arise when on top of some cluster structure the data also contains an unstructured set of points. We consider how clustering algorithms can be “robustified" so that they recover the cluster structure in spite of the unstructured part of the input. We introduce some quantitative measures of such robustness that take into account the strength of the embedded cluster structure as well was the mildness of the noise subset. We propose a simple and efficient method to turn any centroid-based clustering algorithm into a noise-robust one, and prove robustness guarantees for our method with respect to these measures. We also prove that more straightforward ways of “robustifying” clustering algorithms fail to achieve similar guarantees.

Related Material