Convergence Rate of Stochastic kmeans
[edit]
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54:14951503, 2017.
Abstract
We analyze online (Bottou & Bengio, 1994) and minibatch (Sculley, 2010) kmeans variants. Both scale up the widely used Lloyd’s algorithm via stochastic approximation, and have become popular for largescale clustering and unsupervised feature learning. We show, for the first time, that they have global convergence towards “local optima” at rate $O(1/t)$ under general conditions. In addition, we show that if the dataset is clusterable, stochastic kmeans with suitable initialization converges to an optimal kmeans solution at rate $O(1/t)$ with high probability. The kmeans objective is nonconvex and nondifferentiable; we exploit ideas from nonconvex gradientbased optimization by providing a novel characterization of the trajectory of the kmeans algorithm on its solution space, and circumvent its nondifferentiability via geometric insights about the kmeans update.
Related Material


