Robust and Proper Learning for Mixtures of Gaussians via Systems of Polynomial Inequalities

Jerry Li; Ludwig Schmidt

Robust and Proper Learning for Mixtures of Gaussians via Systems of Polynomial Inequalities

Jerry Li, Ludwig Schmidt

Proceedings of the 2017 Conference on Learning Theory, PMLR 65:1302-1382, 2017.

Abstract

Learning a Gaussian mixture model (GMM) is a fundamental statistical problem. One common notion of learning a GMM is proper learning: here, the goal is to find a mixture of $k$ Gaussians $\mathcal{M}$ that is close to the unknown density $f$ from which we draw samples. The distance between $\mathcal{M}$ and $f$ is often measured in the total variation / $L_1$-distance. Our main result is an algorithm for learning a mixture of $k$ univariate Gaussians that is \emphnearly-optimal for any fixed $k$. It is well known that the sample complexity of properly learning a univariate $k$-GMM is $O(k / ε^2)$. However, the best prior \emphrunning time for this problem is $\widetilde{O}(1 / ε^3k-1)$; in particular, the dependence between $1/ε$ and $k$ is exponential. In this paper, we significantly improve this dependence by replacing the $1/ε$ term with $\log 1/ε$, while only increasing the exponent moderately. Specifically, the running time of our algorithm is $(k ⋅\log 1/ ε)^O(k^4) + \widetilde{O}(k / ε^2)$. For any fixed $k$, the $\widetilde{O}(k / ε^2)$ term dominates our running time, and thus our algorithm runs in time which is \emphnearly-linear in the number of samples drawn. Achieving a running time of $\text{poly}(k, 1 / ε)$ for proper learning of $k$-GMMs has recently been stated as an open problem by multiple researchers, and we make progress on this question. Our main algorithmic ingredient is a new connection between proper learning of parametric distributions and systems of polynomial inequalities. We leverage results for piecewise polynomial approximation of GMMs and reduce the learning problem to a much smaller sub-problem. While tihs sub-problem is still non-convex, its size depends only logarithmically on the final accuracy $ε$. Hence we can invoke computationally expensive methods for solving the sub-problem. We show that our connection is also useful in the multivariate setting, where we get new results for learning a mixture of two spherical Gaussians. A variant of our approach is also within reach of modern computer algebra systems. Experiments for learning a 2-GMM show promising results: our algorithm improves over the popular Expectation-Maximization (EM) algorithm in the noisy setting.

Cite this Paper

BibTeX


@InProceedings{pmlr-v65-li17a,
  title = 	 {Robust and Proper Learning for Mixtures of Gaussians via Systems of Polynomial Inequalities},
  author = 	 {Li, Jerry and Schmidt, Ludwig},
  booktitle = 	 {Proceedings of the 2017 Conference on Learning Theory},
  pages = 	 {1302--1382},
  year = 	 {2017},
  editor = 	 {Kale, Satyen and Shamir, Ohad},
  volume = 	 {65},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {07--10 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v65/li17a/li17a.pdf},
  url = 	 {https://proceedings.mlr.press/v65/li17a.html},
  abstract = 	 {Learning a Gaussian mixture model (GMM) is a fundamental statistical problem. One common notion of learning a GMM is proper learning: here, the goal is to find a mixture of $k$ Gaussians $\mathcal{M}$ that is close to the unknown density $f$ from which we draw samples. The distance between $\mathcal{M}$ and $f$ is often measured in the total variation / $L_1$-distance. Our main result is an algorithm for learning a mixture of $k$ univariate Gaussians that is \emphnearly-optimal for any fixed $k$.  It is well known that the sample complexity of properly learning a univariate $k$-GMM is $O(k / ε^2)$.  However, the best prior \emphrunning time for this problem is $\widetilde{O}(1 / ε^3k-1)$; in particular, the dependence between $1/ε$ and $k$ is exponential.  In this paper, we significantly improve this dependence by replacing the $1/ε$ term with $\log 1/ε$, while only increasing the exponent moderately.  Specifically, the running time of our algorithm is $(k ⋅\log 1/ ε)^O(k^4) + \widetilde{O}(k / ε^2)$.  For any fixed $k$, the $\widetilde{O}(k / ε^2)$ term dominates our running time, and thus our algorithm runs in time which is \emphnearly-linear in the number of samples drawn.  Achieving a running time of $\text{poly}(k, 1 / ε)$ for proper learning of $k$-GMMs has recently been stated as an open problem by multiple researchers, and we make progress on this question. Our main algorithmic ingredient is a new connection between proper learning of parametric distributions and systems of polynomial inequalities.  We leverage results for piecewise polynomial approximation of GMMs and reduce the learning problem to a much smaller sub-problem.  While tihs sub-problem is still non-convex, its size depends only logarithmically on the final accuracy $ε$.  Hence we can invoke computationally expensive methods for solving the sub-problem. We show that our connection is also useful in the multivariate setting, where we get new results for learning a mixture of two spherical Gaussians.  A variant of our approach is also within reach of modern computer algebra systems.  Experiments for learning a 2-GMM show promising results: our algorithm improves over the popular Expectation-Maximization (EM) algorithm in the noisy setting.}
}

Endnote

%0 Conference Paper
%T Robust and Proper Learning for Mixtures of Gaussians via Systems of Polynomial Inequalities
%A Jerry Li
%A Ludwig Schmidt
%B Proceedings of the 2017 Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2017
%E Satyen Kale
%E Ohad Shamir	
%F pmlr-v65-li17a
%I PMLR
%P 1302--1382
%U https://proceedings.mlr.press/v65/li17a.html
%V 65
%X Learning a Gaussian mixture model (GMM) is a fundamental statistical problem. One common notion of learning a GMM is proper learning: here, the goal is to find a mixture of $k$ Gaussians $\mathcal{M}$ that is close to the unknown density $f$ from which we draw samples. The distance between $\mathcal{M}$ and $f$ is often measured in the total variation / $L_1$-distance. Our main result is an algorithm for learning a mixture of $k$ univariate Gaussians that is \emphnearly-optimal for any fixed $k$.  It is well known that the sample complexity of properly learning a univariate $k$-GMM is $O(k / ε^2)$.  However, the best prior \emphrunning time for this problem is $\widetilde{O}(1 / ε^3k-1)$; in particular, the dependence between $1/ε$ and $k$ is exponential.  In this paper, we significantly improve this dependence by replacing the $1/ε$ term with $\log 1/ε$, while only increasing the exponent moderately.  Specifically, the running time of our algorithm is $(k ⋅\log 1/ ε)^O(k^4) + \widetilde{O}(k / ε^2)$.  For any fixed $k$, the $\widetilde{O}(k / ε^2)$ term dominates our running time, and thus our algorithm runs in time which is \emphnearly-linear in the number of samples drawn.  Achieving a running time of $\text{poly}(k, 1 / ε)$ for proper learning of $k$-GMMs has recently been stated as an open problem by multiple researchers, and we make progress on this question. Our main algorithmic ingredient is a new connection between proper learning of parametric distributions and systems of polynomial inequalities.  We leverage results for piecewise polynomial approximation of GMMs and reduce the learning problem to a much smaller sub-problem.  While tihs sub-problem is still non-convex, its size depends only logarithmically on the final accuracy $ε$.  Hence we can invoke computationally expensive methods for solving the sub-problem. We show that our connection is also useful in the multivariate setting, where we get new results for learning a mixture of two spherical Gaussians.  A variant of our approach is also within reach of modern computer algebra systems.  Experiments for learning a 2-GMM show promising results: our algorithm improves over the popular Expectation-Maximization (EM) algorithm in the noisy setting.

APA


Li, J. & Schmidt, L.. (2017). Robust and Proper Learning for Mixtures of Gaussians via Systems of Polynomial Inequalities. Proceedings of the 2017 Conference on Learning Theory, in Proceedings of Machine Learning Research 65:1302-1382 Available from https://proceedings.mlr.press/v65/li17a.html.

Related Material

Download PDF