Uniform Deviation Bounds for kMeans Clustering
[edit]
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:283291, 2017.
Abstract
Uniform deviation bounds limit the difference between a model’s expected loss and its loss on an empirical sample uniformly for all models in a learning problem. In this paper, we provide a novel framework to obtain uniform deviation bounds for loss functions which are unbounded. As a result, we obtain competitive uniform deviation bounds for kMeans clustering under weak assumptions on the underlying distribution. If the fourth moment is bounded, we prove a rate of $O(m^{1/2})$ compared to the previously known $O(m^{1/4})$ rate. Furthermore, we show that the rate also depends on the kurtosis – the normalized fourth moment which measures the “tailedness” of a distribution. We also provide improved rates under progressively stronger assumptions, namely, bounded higher moments, subgaussianity and bounded support of the underlying distribution.
Related Material


