Can clustering scale sublinearly with its clusters? A variational EM acceleration of GMMs and k-means

Dennis Forster; Jörg Lücke

Can clustering scale sublinearly with its clusters? A variational EM acceleration of GMMs and k-means

Dennis Forster, Jörg Lücke

Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:124-132, 2018.

Abstract

One iteration of standard k-means (i.e., Lloyd’s algorithm) or standard EM for Gaussian mixture models (GMMs) scales linearly with the number of clusters C, data points N, and data dimensionality D. In this study, we explore whether one iteration of k-means or EM for GMMs can scale sublinearly with C at run-time, while improving the clustering objective remains effective. The tool we apply for complexity reduction is variational EM, which is typically used to make training of generative models with exponentially many hidden states tractable. Here, we apply novel theoretical results on truncated variational EM to make tractable clustering algorithms more efficient. The basic idea is to use a partial variational E-step which reduces the linear complexity of O(NCD) required for a full E-step to a sublinear complexity. Our main observation is that the linear dependency on C can be reduced to a dependency on a much smaller parameter G which relates to cluster neighborhood relations. We focus on two versions of partial variational EM for clustering: variational GMM, scaling with O(NG²D), and variational k-means, scaling with O(NGD) per iteration. Empirical results show that these algorithms still require comparable numbers of iterations to improve the clustering objective to same values as k-means. For data with many clusters, we consequently observe reductions of net computational demands between two and three orders of magnitude. More generally, our results provide substantial empirical evidence in favor of clustering to scale sublinearly with C.

Cite this Paper

BibTeX

@InProceedings{pmlr-v84-forster18a,
  title = 	 {Can clustering scale sublinearly with its clusters? A variational EM acceleration of {GMM}s and k-means},
  author = 	 {Forster, Dennis and Lücke, Jörg},
  booktitle = 	 {Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics},
  pages = 	 {124--132},
  year = 	 {2018},
  editor = 	 {Storkey, Amos and Perez-Cruz, Fernando},
  volume = 	 {84},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--11 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v84/forster18a/forster18a.pdf},
  url = 	 {https://proceedings.mlr.press/v84/forster18a.html},
  abstract = 	 {One iteration of standard k-means (i.e., Lloyd’s algorithm) or standard EM for Gaussian mixture models (GMMs) scales linearly with the number of clusters C, data points N, and data dimensionality D. In this study, we explore whether one iteration of k-means or EM for GMMs can scale sublinearly with C at run-time, while improving the clustering objective remains effective. The tool we apply for complexity reduction is variational EM, which is typically used to make training of generative models with exponentially many hidden states tractable. Here, we apply novel theoretical results on truncated variational EM to make tractable clustering algorithms more efficient. The basic idea is to use a partial variational E-step which reduces the linear complexity of O(NCD) required for a full E-step to a sublinear complexity. Our main observation is that the linear dependency on C can be reduced to a dependency on a much smaller parameter G which relates to cluster neighborhood relations. We focus on two versions of partial variational EM for clustering: variational GMM, scaling with O(NG²D), and variational k-means, scaling with O(NGD) per iteration. Empirical results show that these algorithms still require comparable numbers of iterations to improve the clustering objective to same values as k-means. For data with many clusters, we consequently observe reductions of net computational demands between two and three orders of magnitude. More generally, our results provide substantial empirical evidence in favor of clustering to scale sublinearly with C.}
}

Endnote

%0 Conference Paper
%T Can clustering scale sublinearly with its clusters? A variational EM acceleration of GMMs and k-means
%A Dennis Forster
%A Jörg Lücke
%B Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2018
%E Amos Storkey
%E Fernando Perez-Cruz	
%F pmlr-v84-forster18a
%I PMLR
%P 124--132
%U https://proceedings.mlr.press/v84/forster18a.html
%V 84
%X One iteration of standard k-means (i.e., Lloyd’s algorithm) or standard EM for Gaussian mixture models (GMMs) scales linearly with the number of clusters C, data points N, and data dimensionality D. In this study, we explore whether one iteration of k-means or EM for GMMs can scale sublinearly with C at run-time, while improving the clustering objective remains effective. The tool we apply for complexity reduction is variational EM, which is typically used to make training of generative models with exponentially many hidden states tractable. Here, we apply novel theoretical results on truncated variational EM to make tractable clustering algorithms more efficient. The basic idea is to use a partial variational E-step which reduces the linear complexity of O(NCD) required for a full E-step to a sublinear complexity. Our main observation is that the linear dependency on C can be reduced to a dependency on a much smaller parameter G which relates to cluster neighborhood relations. We focus on two versions of partial variational EM for clustering: variational GMM, scaling with O(NG²D), and variational k-means, scaling with O(NGD) per iteration. Empirical results show that these algorithms still require comparable numbers of iterations to improve the clustering objective to same values as k-means. For data with many clusters, we consequently observe reductions of net computational demands between two and three orders of magnitude. More generally, our results provide substantial empirical evidence in favor of clustering to scale sublinearly with C.

APA

Forster, D. & Lücke, J.. (2018). Can clustering scale sublinearly with its clusters? A variational EM acceleration of GMMs and k-means. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 84:124-132 Available from https://proceedings.mlr.press/v84/forster18a.html.

Can clustering scale sublinearly with its clusters? A variational EM acceleration of GMMs and k-means

Abstract

Cite this Paper

Related Material