Robust and Proper Learning for Mixtures of Gaussians via Systems of Polynomial Inequalities
[edit]
Proceedings of the 2017 Conference on Learning Theory, PMLR 65:13021382, 2017.
Abstract
Learning a Gaussian mixture model (GMM) is a fundamental statistical problem. One common notion of learning a GMM is proper learning: here, the goal is to find a mixture of $k$ Gaussians $\mathcal{M}$ that is close to the unknown density $f$ from which we draw samples. The distance between $\mathcal{M}$ and $f$ is often measured in the total variation / $L_1$distance. Our main result is an algorithm for learning a mixture of $k$ univariate Gaussians that is \emphnearlyoptimal for any fixed $k$. It is well known that the sample complexity of properly learning a univariate $k$GMM is $O(k / ε^2)$. However, the best prior \emphrunning time for this problem is $\widetilde{O}(1 / ε^3k1)$; in particular, the dependence between $1/ε$ and $k$ is exponential. In this paper, we significantly improve this dependence by replacing the $1/ε$ term with $\log 1/ε$, while only increasing the exponent moderately. Specifically, the running time of our algorithm is $(k ⋅\log 1/ ε)^O(k^4) + \widetilde{O}(k / ε^2)$. For any fixed $k$, the $\widetilde{O}(k / ε^2)$ term dominates our running time, and thus our algorithm runs in time which is \emphnearlylinear in the number of samples drawn. Achieving a running time of $\text{poly}(k, 1 / ε)$ for proper learning of $k$GMMs has recently been stated as an open problem by multiple researchers, and we make progress on this question. Our main algorithmic ingredient is a new connection between proper learning of parametric distributions and systems of polynomial inequalities. We leverage results for piecewise polynomial approximation of GMMs and reduce the learning problem to a much smaller subproblem. While tihs subproblem is still nonconvex, its size depends only logarithmically on the final accuracy $ε$. Hence we can invoke computationally expensive methods for solving the subproblem. We show that our connection is also useful in the multivariate setting, where we get new results for learning a mixture of two spherical Gaussians. A variant of our approach is also within reach of modern computer algebra systems. Experiments for learning a 2GMM show promising results: our algorithm improves over the popular ExpectationMaximization (EM) algorithm in the noisy setting.
Related Material


