Convergence Rate of Stochastic k-means
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54:1495-1503, 2017.
We analyze online (Bottou & Bengio, 1994) and mini-batch (Sculley, 2010) k-means variants. Both scale up the widely used Lloyd’s algorithm via stochastic approximation, and have become popular for large-scale clustering and unsupervised feature learning. We show, for the first time, that they have global convergence towards “local optima” at rate $O(1/t)$ under general conditions. In addition, we show that if the dataset is clusterable, stochastic k-means with suitable initialization converges to an optimal k-means solution at rate $O(1/t)$ with high probability. The k-means objective is non-convex and non-differentiable; we exploit ideas from non-convex gradient-based optimization by providing a novel characterization of the trajectory of the k-means algorithm on its solution space, and circumvent its non-differentiability via geometric insights about the k-means update.