Robust and Proper Learning for Mixtures of Gaussians via Systems of Polynomial Inequalities

Jerry Li, Ludwig Schmidt
Proceedings of the 2017 Conference on Learning Theory, PMLR 65:1302-1382, 2017.

Abstract

Learning a Gaussian mixture model (GMM) is a fundamental statistical problem. One common notion of learning a GMM is proper learning: here, the goal is to find a mixture of $k$ Gaussians $\mathcal{M}$ that is close to the unknown density $f$ from which we draw samples. The distance between $\mathcal{M}$ and $f$ is often measured in the total variation / $L_1$-distance. Our main result is an algorithm for learning a mixture of $k$ univariate Gaussians that is \emphnearly-optimal for any fixed $k$. It is well known that the sample complexity of properly learning a univariate $k$-GMM is $O(k / ε^2)$. However, the best prior \emphrunning time for this problem is $\widetilde{O}(1 / ε^3k-1)$; in particular, the dependence between $1/ε$ and $k$ is exponential. In this paper, we significantly improve this dependence by replacing the $1/ε$ term with $\log 1/ε$, while only increasing the exponent moderately. Specifically, the running time of our algorithm is $(k ⋅\log 1/ ε)^O(k^4) + \widetilde{O}(k / ε^2)$. For any fixed $k$, the $\widetilde{O}(k / ε^2)$ term dominates our running time, and thus our algorithm runs in time which is \emphnearly-linear in the number of samples drawn. Achieving a running time of $\text{poly}(k, 1 / ε)$ for proper learning of $k$-GMMs has recently been stated as an open problem by multiple researchers, and we make progress on this question. Our main algorithmic ingredient is a new connection between proper learning of parametric distributions and systems of polynomial inequalities. We leverage results for piecewise polynomial approximation of GMMs and reduce the learning problem to a much smaller sub-problem. While tihs sub-problem is still non-convex, its size depends only logarithmically on the final accuracy $ε$. Hence we can invoke computationally expensive methods for solving the sub-problem. We show that our connection is also useful in the multivariate setting, where we get new results for learning a mixture of two spherical Gaussians. A variant of our approach is also within reach of modern computer algebra systems. Experiments for learning a 2-GMM show promising results: our algorithm improves over the popular Expectation-Maximization (EM) algorithm in the noisy setting.

Cite this Paper


BibTeX
@InProceedings{pmlr-v65-li17a, title = {Robust and Proper Learning for Mixtures of Gaussians via Systems of Polynomial Inequalities}, author = {Li, Jerry and Schmidt, Ludwig}, booktitle = {Proceedings of the 2017 Conference on Learning Theory}, pages = {1302--1382}, year = {2017}, editor = {Kale, Satyen and Shamir, Ohad}, volume = {65}, series = {Proceedings of Machine Learning Research}, month = {07--10 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v65/li17a/li17a.pdf}, url = {https://proceedings.mlr.press/v65/li17a.html}, abstract = {Learning a Gaussian mixture model (GMM) is a fundamental statistical problem. One common notion of learning a GMM is proper learning: here, the goal is to find a mixture of $k$ Gaussians $\mathcal{M}$ that is close to the unknown density $f$ from which we draw samples. The distance between $\mathcal{M}$ and $f$ is often measured in the total variation / $L_1$-distance. Our main result is an algorithm for learning a mixture of $k$ univariate Gaussians that is \emphnearly-optimal for any fixed $k$. It is well known that the sample complexity of properly learning a univariate $k$-GMM is $O(k / ε^2)$. However, the best prior \emphrunning time for this problem is $\widetilde{O}(1 / ε^3k-1)$; in particular, the dependence between $1/ε$ and $k$ is exponential. In this paper, we significantly improve this dependence by replacing the $1/ε$ term with $\log 1/ε$, while only increasing the exponent moderately. Specifically, the running time of our algorithm is $(k ⋅\log 1/ ε)^O(k^4) + \widetilde{O}(k / ε^2)$. For any fixed $k$, the $\widetilde{O}(k / ε^2)$ term dominates our running time, and thus our algorithm runs in time which is \emphnearly-linear in the number of samples drawn. Achieving a running time of $\text{poly}(k, 1 / ε)$ for proper learning of $k$-GMMs has recently been stated as an open problem by multiple researchers, and we make progress on this question. Our main algorithmic ingredient is a new connection between proper learning of parametric distributions and systems of polynomial inequalities. We leverage results for piecewise polynomial approximation of GMMs and reduce the learning problem to a much smaller sub-problem. While tihs sub-problem is still non-convex, its size depends only logarithmically on the final accuracy $ε$. Hence we can invoke computationally expensive methods for solving the sub-problem. We show that our connection is also useful in the multivariate setting, where we get new results for learning a mixture of two spherical Gaussians. A variant of our approach is also within reach of modern computer algebra systems. Experiments for learning a 2-GMM show promising results: our algorithm improves over the popular Expectation-Maximization (EM) algorithm in the noisy setting.} }
Endnote
%0 Conference Paper %T Robust and Proper Learning for Mixtures of Gaussians via Systems of Polynomial Inequalities %A Jerry Li %A Ludwig Schmidt %B Proceedings of the 2017 Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2017 %E Satyen Kale %E Ohad Shamir %F pmlr-v65-li17a %I PMLR %P 1302--1382 %U https://proceedings.mlr.press/v65/li17a.html %V 65 %X Learning a Gaussian mixture model (GMM) is a fundamental statistical problem. One common notion of learning a GMM is proper learning: here, the goal is to find a mixture of $k$ Gaussians $\mathcal{M}$ that is close to the unknown density $f$ from which we draw samples. The distance between $\mathcal{M}$ and $f$ is often measured in the total variation / $L_1$-distance. Our main result is an algorithm for learning a mixture of $k$ univariate Gaussians that is \emphnearly-optimal for any fixed $k$. It is well known that the sample complexity of properly learning a univariate $k$-GMM is $O(k / ε^2)$. However, the best prior \emphrunning time for this problem is $\widetilde{O}(1 / ε^3k-1)$; in particular, the dependence between $1/ε$ and $k$ is exponential. In this paper, we significantly improve this dependence by replacing the $1/ε$ term with $\log 1/ε$, while only increasing the exponent moderately. Specifically, the running time of our algorithm is $(k ⋅\log 1/ ε)^O(k^4) + \widetilde{O}(k / ε^2)$. For any fixed $k$, the $\widetilde{O}(k / ε^2)$ term dominates our running time, and thus our algorithm runs in time which is \emphnearly-linear in the number of samples drawn. Achieving a running time of $\text{poly}(k, 1 / ε)$ for proper learning of $k$-GMMs has recently been stated as an open problem by multiple researchers, and we make progress on this question. Our main algorithmic ingredient is a new connection between proper learning of parametric distributions and systems of polynomial inequalities. We leverage results for piecewise polynomial approximation of GMMs and reduce the learning problem to a much smaller sub-problem. While tihs sub-problem is still non-convex, its size depends only logarithmically on the final accuracy $ε$. Hence we can invoke computationally expensive methods for solving the sub-problem. We show that our connection is also useful in the multivariate setting, where we get new results for learning a mixture of two spherical Gaussians. A variant of our approach is also within reach of modern computer algebra systems. Experiments for learning a 2-GMM show promising results: our algorithm improves over the popular Expectation-Maximization (EM) algorithm in the noisy setting.
APA
Li, J. & Schmidt, L.. (2017). Robust and Proper Learning for Mixtures of Gaussians via Systems of Polynomial Inequalities. Proceedings of the 2017 Conference on Learning Theory, in Proceedings of Machine Learning Research 65:1302-1382 Available from https://proceedings.mlr.press/v65/li17a.html.

Related Material