Robust and Proper Learning for Mixtures of Gaussians via Systems of Polynomial Inequalities

[edit]

Jerry Li, Ludwig Schmidt ;
Proceedings of the 2017 Conference on Learning Theory, PMLR 65:1302-1382, 2017.

Abstract

Learning a Gaussian mixture model (GMM) is a fundamental statistical problem. One common notion of learning a GMM is proper learning: here, the goal is to find a mixture of $k$ Gaussians $\mathcal{M}$ that is close to the unknown density $f$ from which we draw samples. The distance between $\mathcal{M}$ and $f$ is often measured in the total variation / $L_1$-distance. Our main result is an algorithm for learning a mixture of $k$ univariate Gaussians that is \emphnearly-optimal for any fixed $k$. It is well known that the sample complexity of properly learning a univariate $k$-GMM is $O(k / ε^2)$. However, the best prior \emphrunning time for this problem is $\widetilde{O}(1 / ε^3k-1)$; in particular, the dependence between $1/ε$ and $k$ is exponential. In this paper, we significantly improve this dependence by replacing the $1/ε$ term with $\log 1/ε$, while only increasing the exponent moderately. Specifically, the running time of our algorithm is $(k ⋅\log 1/ ε)^O(k^4) + \widetilde{O}(k / ε^2)$. For any fixed $k$, the $\widetilde{O}(k / ε^2)$ term dominates our running time, and thus our algorithm runs in time which is \emphnearly-linear in the number of samples drawn. Achieving a running time of $\text{poly}(k, 1 / ε)$ for proper learning of $k$-GMMs has recently been stated as an open problem by multiple researchers, and we make progress on this question. Our main algorithmic ingredient is a new connection between proper learning of parametric distributions and systems of polynomial inequalities. We leverage results for piecewise polynomial approximation of GMMs and reduce the learning problem to a much smaller sub-problem. While tihs sub-problem is still non-convex, its size depends only logarithmically on the final accuracy $ε$. Hence we can invoke computationally expensive methods for solving the sub-problem. We show that our connection is also useful in the multivariate setting, where we get new results for learning a mixture of two spherical Gaussians. A variant of our approach is also within reach of modern computer algebra systems. Experiments for learning a 2-GMM show promising results: our algorithm improves over the popular Expectation-Maximization (EM) algorithm in the noisy setting.

Related Material