Clustering by Sum of Norms: Stochastic Incremental Algorithm, Convergence and Cluster Recovery

Ashkan Panahi, Devdatt Dubhashi, Fredrik D. Johansson, Chiranjib Bhattacharyya
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:2769-2777, 2017.

Abstract

Standard clustering methods such as K-means, Gaussian mixture models, and hierarchical clustering are beset by local minima, which are sometimes drastically suboptimal. Moreover the number of clusters K must be known in advance. The recently introduced the sum-of-norms (SON) or Clusterpath convex relaxation of k-means and hierarchical clustering shrinks cluster centroids toward one another and ensure a unique global minimizer. We give a scalable stochastic incremental algorithm based on proximal iterations to solve the SON problem with convergence guarantees. We also show that the algorithm recovers clusters under quite general conditions which have a similar form to the unifying proximity condition introduced in the approximation algorithms community (that covers paradigm cases such as Gaussian mixtures and planted partition models). We give experimental results to confirm that our algorithm scales much better than previous methods while producing clusters of comparable quality.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-panahi17a, title = {Clustering by Sum of Norms: Stochastic Incremental Algorithm, Convergence and Cluster Recovery}, author = {Ashkan Panahi and Devdatt Dubhashi and Fredrik D. Johansson and Chiranjib Bhattacharyya}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {2769--2777}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/panahi17a/panahi17a.pdf}, url = {https://proceedings.mlr.press/v70/panahi17a.html}, abstract = {Standard clustering methods such as K-means, Gaussian mixture models, and hierarchical clustering are beset by local minima, which are sometimes drastically suboptimal. Moreover the number of clusters K must be known in advance. The recently introduced the sum-of-norms (SON) or Clusterpath convex relaxation of k-means and hierarchical clustering shrinks cluster centroids toward one another and ensure a unique global minimizer. We give a scalable stochastic incremental algorithm based on proximal iterations to solve the SON problem with convergence guarantees. We also show that the algorithm recovers clusters under quite general conditions which have a similar form to the unifying proximity condition introduced in the approximation algorithms community (that covers paradigm cases such as Gaussian mixtures and planted partition models). We give experimental results to confirm that our algorithm scales much better than previous methods while producing clusters of comparable quality.} }
Endnote
%0 Conference Paper %T Clustering by Sum of Norms: Stochastic Incremental Algorithm, Convergence and Cluster Recovery %A Ashkan Panahi %A Devdatt Dubhashi %A Fredrik D. Johansson %A Chiranjib Bhattacharyya %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-panahi17a %I PMLR %P 2769--2777 %U https://proceedings.mlr.press/v70/panahi17a.html %V 70 %X Standard clustering methods such as K-means, Gaussian mixture models, and hierarchical clustering are beset by local minima, which are sometimes drastically suboptimal. Moreover the number of clusters K must be known in advance. The recently introduced the sum-of-norms (SON) or Clusterpath convex relaxation of k-means and hierarchical clustering shrinks cluster centroids toward one another and ensure a unique global minimizer. We give a scalable stochastic incremental algorithm based on proximal iterations to solve the SON problem with convergence guarantees. We also show that the algorithm recovers clusters under quite general conditions which have a similar form to the unifying proximity condition introduced in the approximation algorithms community (that covers paradigm cases such as Gaussian mixtures and planted partition models). We give experimental results to confirm that our algorithm scales much better than previous methods while producing clusters of comparable quality.
APA
Panahi, A., Dubhashi, D., Johansson, F.D. & Bhattacharyya, C.. (2017). Clustering by Sum of Norms: Stochastic Incremental Algorithm, Convergence and Cluster Recovery. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:2769-2777 Available from https://proceedings.mlr.press/v70/panahi17a.html.

Related Material