Convergence Rate of Stochastic k-means

Cheng Tang, Claire Monteleoni
; Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54:1495-1503, 2017.

Abstract

We analyze online (Bottou & Bengio, 1994) and mini-batch (Sculley, 2010) k-means variants. Both scale up the widely used Lloyd’s algorithm via stochastic approximation, and have become popular for large-scale clustering and unsupervised feature learning. We show, for the first time, that they have global convergence towards “local optima” at rate $O(1/t)$ under general conditions. In addition, we show that if the dataset is clusterable, stochastic k-means with suitable initialization converges to an optimal k-means solution at rate $O(1/t)$ with high probability. The k-means objective is non-convex and non-differentiable; we exploit ideas from non-convex gradient-based optimization by providing a novel characterization of the trajectory of the k-means algorithm on its solution space, and circumvent its non-differentiability via geometric insights about the k-means update.

Cite this Paper


BibTeX
@InProceedings{pmlr-v54-tang17b, title = {{Convergence rate of stochastic k-means}}, author = {Cheng Tang and Claire Monteleoni}, booktitle = {Proceedings of the 20th International Conference on Artificial Intelligence and Statistics}, pages = {1495--1503}, year = {2017}, editor = {Aarti Singh and Jerry Zhu}, volume = {54}, series = {Proceedings of Machine Learning Research}, address = {Fort Lauderdale, FL, USA}, month = {20--22 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v54/tang17b/tang17b.pdf}, url = {http://proceedings.mlr.press/v54/tang17b.html}, abstract = {We analyze online (Bottou & Bengio, 1994) and mini-batch (Sculley, 2010) k-means variants. Both scale up the widely used Lloyd’s algorithm via stochastic approximation, and have become popular for large-scale clustering and unsupervised feature learning. We show, for the first time, that they have global convergence towards “local optima” at rate $O(1/t)$ under general conditions. In addition, we show that if the dataset is clusterable, stochastic k-means with suitable initialization converges to an optimal k-means solution at rate $O(1/t)$ with high probability. The k-means objective is non-convex and non-differentiable; we exploit ideas from non-convex gradient-based optimization by providing a novel characterization of the trajectory of the k-means algorithm on its solution space, and circumvent its non-differentiability via geometric insights about the k-means update.} }
Endnote
%0 Conference Paper %T Convergence Rate of Stochastic k-means %A Cheng Tang %A Claire Monteleoni %B Proceedings of the 20th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2017 %E Aarti Singh %E Jerry Zhu %F pmlr-v54-tang17b %I PMLR %J Proceedings of Machine Learning Research %P 1495--1503 %U http://proceedings.mlr.press %V 54 %W PMLR %X We analyze online (Bottou & Bengio, 1994) and mini-batch (Sculley, 2010) k-means variants. Both scale up the widely used Lloyd’s algorithm via stochastic approximation, and have become popular for large-scale clustering and unsupervised feature learning. We show, for the first time, that they have global convergence towards “local optima” at rate $O(1/t)$ under general conditions. In addition, we show that if the dataset is clusterable, stochastic k-means with suitable initialization converges to an optimal k-means solution at rate $O(1/t)$ with high probability. The k-means objective is non-convex and non-differentiable; we exploit ideas from non-convex gradient-based optimization by providing a novel characterization of the trajectory of the k-means algorithm on its solution space, and circumvent its non-differentiability via geometric insights about the k-means update.
APA
Tang, C. & Monteleoni, C.. (2017). Convergence Rate of Stochastic k-means. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, in PMLR 54:1495-1503

Related Material