Clustering in the Presence of Background Noise

Shai Ben-David, Nika Haghtalab
Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):280-288, 2014.

Abstract

We address the problem of noise management in clustering algorithms. Namely, issues that arise when on top of some cluster structure the data also contains an unstructured set of points. We consider how clustering algorithms can be “robustified" so that they recover the cluster structure in spite of the unstructured part of the input. We introduce some quantitative measures of such robustness that take into account the strength of the embedded cluster structure as well was the mildness of the noise subset. We propose a simple and efficient method to turn any centroid-based clustering algorithm into a noise-robust one, and prove robustness guarantees for our method with respect to these measures. We also prove that more straightforward ways of “robustifying” clustering algorithms fail to achieve similar guarantees.

Cite this Paper


BibTeX
@InProceedings{pmlr-v32-ben-david14, title = {Clustering in the Presence of Background Noise}, author = {Ben-David, Shai and Haghtalab, Nika}, booktitle = {Proceedings of the 31st International Conference on Machine Learning}, pages = {280--288}, year = {2014}, editor = {Xing, Eric P. and Jebara, Tony}, volume = {32}, number = {2}, series = {Proceedings of Machine Learning Research}, address = {Bejing, China}, month = {22--24 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v32/ben-david14.pdf}, url = {https://proceedings.mlr.press/v32/ben-david14.html}, abstract = {We address the problem of noise management in clustering algorithms. Namely, issues that arise when on top of some cluster structure the data also contains an unstructured set of points. We consider how clustering algorithms can be “robustified" so that they recover the cluster structure in spite of the unstructured part of the input. We introduce some quantitative measures of such robustness that take into account the strength of the embedded cluster structure as well was the mildness of the noise subset. We propose a simple and efficient method to turn any centroid-based clustering algorithm into a noise-robust one, and prove robustness guarantees for our method with respect to these measures. We also prove that more straightforward ways of “robustifying” clustering algorithms fail to achieve similar guarantees.} }
Endnote
%0 Conference Paper %T Clustering in the Presence of Background Noise %A Shai Ben-David %A Nika Haghtalab %B Proceedings of the 31st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2014 %E Eric P. Xing %E Tony Jebara %F pmlr-v32-ben-david14 %I PMLR %P 280--288 %U https://proceedings.mlr.press/v32/ben-david14.html %V 32 %N 2 %X We address the problem of noise management in clustering algorithms. Namely, issues that arise when on top of some cluster structure the data also contains an unstructured set of points. We consider how clustering algorithms can be “robustified" so that they recover the cluster structure in spite of the unstructured part of the input. We introduce some quantitative measures of such robustness that take into account the strength of the embedded cluster structure as well was the mildness of the noise subset. We propose a simple and efficient method to turn any centroid-based clustering algorithm into a noise-robust one, and prove robustness guarantees for our method with respect to these measures. We also prove that more straightforward ways of “robustifying” clustering algorithms fail to achieve similar guarantees.
RIS
TY - CPAPER TI - Clustering in the Presence of Background Noise AU - Shai Ben-David AU - Nika Haghtalab BT - Proceedings of the 31st International Conference on Machine Learning DA - 2014/06/18 ED - Eric P. Xing ED - Tony Jebara ID - pmlr-v32-ben-david14 PB - PMLR DP - Proceedings of Machine Learning Research VL - 32 IS - 2 SP - 280 EP - 288 L1 - http://proceedings.mlr.press/v32/ben-david14.pdf UR - https://proceedings.mlr.press/v32/ben-david14.html AB - We address the problem of noise management in clustering algorithms. Namely, issues that arise when on top of some cluster structure the data also contains an unstructured set of points. We consider how clustering algorithms can be “robustified" so that they recover the cluster structure in spite of the unstructured part of the input. We introduce some quantitative measures of such robustness that take into account the strength of the embedded cluster structure as well was the mildness of the noise subset. We propose a simple and efficient method to turn any centroid-based clustering algorithm into a noise-robust one, and prove robustness guarantees for our method with respect to these measures. We also prove that more straightforward ways of “robustifying” clustering algorithms fail to achieve similar guarantees. ER -
APA
Ben-David, S. & Haghtalab, N.. (2014). Clustering in the Presence of Background Noise. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):280-288 Available from https://proceedings.mlr.press/v32/ben-david14.html.

Related Material