Uniform Deviation Bounds for k-Means Clustering

Olivier Bachem, Mario Lucic, S. Hamed Hassani, Andreas Krause
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:283-291, 2017.

Abstract

Uniform deviation bounds limit the difference between a model’s expected loss and its loss on an empirical sample uniformly for all models in a learning problem. In this paper, we provide a novel framework to obtain uniform deviation bounds for loss functions which are unbounded. As a result, we obtain competitive uniform deviation bounds for k-Means clustering under weak assumptions on the underlying distribution. If the fourth moment is bounded, we prove a rate of $O(m^{-1/2})$ compared to the previously known $O(m^{-1/4})$ rate. Furthermore, we show that the rate also depends on the kurtosis – the normalized fourth moment which measures the “tailedness” of a distribution. We also provide improved rates under progressively stronger assumptions, namely, bounded higher moments, subgaussianity and bounded support of the underlying distribution.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-bachem17a, title = {Uniform Deviation Bounds for k-Means Clustering}, author = {Olivier Bachem and Mario Lucic and S. Hamed Hassani and Andreas Krause}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {283--291}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/bachem17a/bachem17a.pdf}, url = {https://proceedings.mlr.press/v70/bachem17a.html}, abstract = {Uniform deviation bounds limit the difference between a model’s expected loss and its loss on an empirical sample uniformly for all models in a learning problem. In this paper, we provide a novel framework to obtain uniform deviation bounds for loss functions which are unbounded. As a result, we obtain competitive uniform deviation bounds for k-Means clustering under weak assumptions on the underlying distribution. If the fourth moment is bounded, we prove a rate of $O(m^{-1/2})$ compared to the previously known $O(m^{-1/4})$ rate. Furthermore, we show that the rate also depends on the kurtosis – the normalized fourth moment which measures the “tailedness” of a distribution. We also provide improved rates under progressively stronger assumptions, namely, bounded higher moments, subgaussianity and bounded support of the underlying distribution.} }
Endnote
%0 Conference Paper %T Uniform Deviation Bounds for k-Means Clustering %A Olivier Bachem %A Mario Lucic %A S. Hamed Hassani %A Andreas Krause %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-bachem17a %I PMLR %P 283--291 %U https://proceedings.mlr.press/v70/bachem17a.html %V 70 %X Uniform deviation bounds limit the difference between a model’s expected loss and its loss on an empirical sample uniformly for all models in a learning problem. In this paper, we provide a novel framework to obtain uniform deviation bounds for loss functions which are unbounded. As a result, we obtain competitive uniform deviation bounds for k-Means clustering under weak assumptions on the underlying distribution. If the fourth moment is bounded, we prove a rate of $O(m^{-1/2})$ compared to the previously known $O(m^{-1/4})$ rate. Furthermore, we show that the rate also depends on the kurtosis – the normalized fourth moment which measures the “tailedness” of a distribution. We also provide improved rates under progressively stronger assumptions, namely, bounded higher moments, subgaussianity and bounded support of the underlying distribution.
APA
Bachem, O., Lucic, M., Hassani, S.H. & Krause, A.. (2017). Uniform Deviation Bounds for k-Means Clustering. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:283-291 Available from https://proceedings.mlr.press/v70/bachem17a.html.

Related Material