Uniform Deviation Bounds for k-Means Clustering

Olivier Bachem; Mario Lucic; S. Hamed Hassani; Andreas Krause

Uniform Deviation Bounds for k-Means Clustering

Olivier Bachem, Mario Lucic, S. Hamed Hassani, Andreas Krause

Proceedings of the 34th International Conference on Machine Learning, PMLR 70:283-291, 2017.

Abstract

Uniform deviation bounds limit the difference between a model’s expected loss and its loss on an empirical sample uniformly for all models in a learning problem. In this paper, we provide a novel framework to obtain uniform deviation bounds for loss functions which are unbounded. As a result, we obtain competitive uniform deviation bounds for k-Means clustering under weak assumptions on the underlying distribution. If the fourth moment is bounded, we prove a rate of $O(m^{-1/2})$ compared to the previously known $O(m^{-1/4})$ rate. Furthermore, we show that the rate also depends on the kurtosis – the normalized fourth moment which measures the “tailedness” of a distribution. We also provide improved rates under progressively stronger assumptions, namely, bounded higher moments, subgaussianity and bounded support of the underlying distribution.

Cite this Paper

BibTeX


@InProceedings{pmlr-v70-bachem17a,
  title = 	 {Uniform Deviation Bounds for k-Means Clustering},
  author =       {Olivier Bachem and Mario Lucic and S. Hamed Hassani and Andreas Krause},
  booktitle = 	 {Proceedings of the 34th International Conference on Machine Learning},
  pages = 	 {283--291},
  year = 	 {2017},
  editor = 	 {Precup, Doina and Teh, Yee Whye},
  volume = 	 {70},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--11 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v70/bachem17a/bachem17a.pdf},
  url = 	 {https://proceedings.mlr.press/v70/bachem17a.html},
  abstract = 	 {Uniform deviation bounds limit the difference between a model’s expected loss and its loss on an empirical sample uniformly for all models in a learning problem. In this paper, we provide a novel framework to obtain uniform deviation bounds for loss functions which are unbounded. As a result, we obtain competitive uniform deviation bounds for k-Means clustering under weak assumptions on the underlying distribution. If the fourth moment is bounded, we prove a rate of $O(m^{-1/2})$ compared to the previously known $O(m^{-1/4})$ rate. Furthermore, we show that the rate also depends on the kurtosis – the normalized fourth moment which measures the “tailedness” of a distribution. We also provide improved rates under progressively stronger assumptions, namely, bounded higher moments, subgaussianity and bounded support of the underlying distribution.}
}

Endnote

%0 Conference Paper
%T Uniform Deviation Bounds for k-Means Clustering
%A Olivier Bachem
%A Mario Lucic
%A S. Hamed Hassani
%A Andreas Krause
%B Proceedings of the 34th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2017
%E Doina Precup
%E Yee Whye Teh	
%F pmlr-v70-bachem17a
%I PMLR
%P 283--291
%U https://proceedings.mlr.press/v70/bachem17a.html
%V 70
%X Uniform deviation bounds limit the difference between a model’s expected loss and its loss on an empirical sample uniformly for all models in a learning problem. In this paper, we provide a novel framework to obtain uniform deviation bounds for loss functions which are unbounded. As a result, we obtain competitive uniform deviation bounds for k-Means clustering under weak assumptions on the underlying distribution. If the fourth moment is bounded, we prove a rate of $O(m^{-1/2})$ compared to the previously known $O(m^{-1/4})$ rate. Furthermore, we show that the rate also depends on the kurtosis – the normalized fourth moment which measures the “tailedness” of a distribution. We also provide improved rates under progressively stronger assumptions, namely, bounded higher moments, subgaussianity and bounded support of the underlying distribution.

APA


Bachem, O., Lucic, M., Hassani, S.H. & Krause, A.. (2017). Uniform Deviation Bounds for k-Means Clustering. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:283-291 Available from https://proceedings.mlr.press/v70/bachem17a.html.

Uniform Deviation Bounds for k-Means Clustering

Abstract

Cite this Paper

Related Material