Clustering Semi-Random Mixtures of Gaussians

Aravindan Vijayaraghavan; Pranjal Awasthi

Clustering Semi-Random Mixtures of Gaussians

Aravindan Vijayaraghavan, Pranjal Awasthi

Proceedings of the 35th International Conference on Machine Learning, PMLR 80:5055-5064, 2018.

Abstract

Gaussian mixture models (GMM) are the most widely used statistical model for the k-means clustering problem and form a popular framework for clustering in machine learning and data analysis. In this paper, we propose a natural robust model for k-means clustering that generalizes the Gaussian mixture model, and that we believe will be useful in identifying robust algorithms. Our first contribution is a polynomial time algorithm that provably recovers the ground-truth up to small classification error w.h.p., assuming certain separation between the components. Perhaps surprisingly, the algorithm we analyze is the popular Lloyd’s algorithm for k-means clustering that is the method-of-choice in practice. Our second result complements the upper bound by giving a nearly matching lower bound on the number of misclassified points incurred by any k-means clustering algorithm on the semi-random model.

Cite this Paper

BibTeX


@InProceedings{pmlr-v80-vijayaraghavan18a,
  title = 	 {Clustering Semi-Random Mixtures of {G}aussians},
  author =       {Vijayaraghavan, Aravindan and Awasthi, Pranjal},
  booktitle = 	 {Proceedings of the 35th International Conference on Machine Learning},
  pages = 	 {5055--5064},
  year = 	 {2018},
  editor = 	 {Dy, Jennifer and Krause, Andreas},
  volume = 	 {80},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v80/vijayaraghavan18a/vijayaraghavan18a.pdf},
  url = 	 {https://proceedings.mlr.press/v80/vijayaraghavan18a.html},
  abstract = 	 {Gaussian mixture models (GMM) are the most widely used statistical model for the k-means clustering problem and form a popular framework for clustering in machine learning and data analysis. In this paper, we propose a natural robust model for k-means clustering that generalizes the Gaussian mixture model, and that we believe will be useful in identifying robust algorithms. Our first contribution is a polynomial time algorithm that provably recovers the ground-truth up to small classification error w.h.p., assuming certain separation between the components. Perhaps surprisingly, the algorithm we analyze is the popular Lloyd’s algorithm for k-means clustering that is the method-of-choice in practice. Our second result complements the upper bound by giving a nearly matching lower bound on the number of misclassified points incurred by any k-means clustering algorithm on the semi-random model.}
}

Endnote

%0 Conference Paper
%T Clustering Semi-Random Mixtures of Gaussians
%A Aravindan Vijayaraghavan
%A Pranjal Awasthi
%B Proceedings of the 35th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Jennifer Dy
%E Andreas Krause	
%F pmlr-v80-vijayaraghavan18a
%I PMLR
%P 5055--5064
%U https://proceedings.mlr.press/v80/vijayaraghavan18a.html
%V 80
%X Gaussian mixture models (GMM) are the most widely used statistical model for the k-means clustering problem and form a popular framework for clustering in machine learning and data analysis. In this paper, we propose a natural robust model for k-means clustering that generalizes the Gaussian mixture model, and that we believe will be useful in identifying robust algorithms. Our first contribution is a polynomial time algorithm that provably recovers the ground-truth up to small classification error w.h.p., assuming certain separation between the components. Perhaps surprisingly, the algorithm we analyze is the popular Lloyd’s algorithm for k-means clustering that is the method-of-choice in practice. Our second result complements the upper bound by giving a nearly matching lower bound on the number of misclassified points incurred by any k-means clustering algorithm on the semi-random model.

APA


Vijayaraghavan, A. & Awasthi, P.. (2018). Clustering Semi-Random Mixtures of Gaussians. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:5055-5064 Available from https://proceedings.mlr.press/v80/vijayaraghavan18a.html.

Clustering Semi-Random Mixtures of Gaussians

Abstract

Cite this Paper

Related Material