Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians

Constantinos Daskalakis; Gautam Kamath

Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians

Constantinos Daskalakis, Gautam Kamath

Proceedings of The 27th Conference on Learning Theory, PMLR 35:1183-1213, 2014.

Abstract

We provide an algorithm for properly learning mixtures of two single-dimensional Gaussians without any separability assumptions. Given \tildeO(1/\varepsilon^2) samples from an unknown mixture, our algorithm outputs a mixture that is \varepsilon-close in total variation distance, in time \tildeO(1/\varepsilon^5). Our sample complexity is optimal up to logarithmic factors, and significantly improves upon both Kalai et al., whose algorithm has a prohibitive dependence on 1/\varepsilon, and Feldman et al., whose algorithm requires bounds on the mixture parameters and depends pseudo-polynomially in these parameters. One of our main contributions is an improved and generalized algorithm for selecting a good candidate distribution from among competing hypotheses. Namely, given a collection of N hypotheses containing at least one candidate that is \varepsilon-close to an unknown distribution, our algorithm outputs a candidate which is O(\varepsilon)-close to the distribution. The algorithm requires O(\logN/\varepsilon^2) samples from the unknown distribution and O(N \log N/\varepsilon^2) time, which improves previous such results (such as the Scheffé estimator) from a quadratic dependence of the running time on N to quasilinear. Given the wide use of such results for the purpose of hypothesis selection, our improved algorithm implies immediate improvements to any such use.

Cite this Paper

BibTeX

@InProceedings{pmlr-v35-daskalakis14,
  title = 	 {Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians},
  author = 	 {Daskalakis, Constantinos and Kamath, Gautam},
  booktitle = 	 {Proceedings of The 27th Conference on Learning Theory},
  pages = 	 {1183--1213},
  year = 	 {2014},
  editor = 	 {Balcan, Maria Florina and Feldman, Vitaly and Szepesvári, Csaba},
  volume = 	 {35},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Barcelona, Spain},
  month = 	 {13--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v35/daskalakis14.pdf},
  url = 	 {https://proceedings.mlr.press/v35/daskalakis14.html},
  abstract = 	 {We provide an algorithm for properly learning mixtures of two single-dimensional Gaussians without any separability assumptions. Given \tildeO(1/\varepsilon^2) samples from an unknown mixture, our algorithm outputs a mixture that is \varepsilon-close in total variation distance, in time \tildeO(1/\varepsilon^5). Our sample complexity is optimal up to logarithmic factors, and significantly improves upon both Kalai et al., whose algorithm has a prohibitive dependence on 1/\varepsilon, and Feldman et al., whose algorithm requires bounds on the mixture parameters and depends pseudo-polynomially in these parameters. One of our main contributions is an improved and generalized algorithm for selecting a good candidate distribution from among competing hypotheses. Namely, given a collection of N hypotheses containing at least one candidate that is \varepsilon-close to an unknown distribution, our algorithm outputs a candidate which is O(\varepsilon)-close to the  distribution. The algorithm requires O(\logN/\varepsilon^2) samples from the unknown distribution and O(N \log N/\varepsilon^2) time, which improves previous such results (such as the Scheffé estimator) from a quadratic dependence of the running time on N to quasilinear. Given the wide use of such results for the purpose of hypothesis selection, our improved algorithm implies immediate improvements to any such use. }
}

Endnote

%0 Conference Paper
%T Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians
%A Constantinos Daskalakis
%A Gautam Kamath
%B Proceedings of The 27th Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2014
%E Maria Florina Balcan
%E Vitaly Feldman
%E Csaba Szepesvári	
%F pmlr-v35-daskalakis14
%I PMLR
%P 1183--1213
%U https://proceedings.mlr.press/v35/daskalakis14.html
%V 35
%X We provide an algorithm for properly learning mixtures of two single-dimensional Gaussians without any separability assumptions. Given \tildeO(1/\varepsilon^2) samples from an unknown mixture, our algorithm outputs a mixture that is \varepsilon-close in total variation distance, in time \tildeO(1/\varepsilon^5). Our sample complexity is optimal up to logarithmic factors, and significantly improves upon both Kalai et al., whose algorithm has a prohibitive dependence on 1/\varepsilon, and Feldman et al., whose algorithm requires bounds on the mixture parameters and depends pseudo-polynomially in these parameters. One of our main contributions is an improved and generalized algorithm for selecting a good candidate distribution from among competing hypotheses. Namely, given a collection of N hypotheses containing at least one candidate that is \varepsilon-close to an unknown distribution, our algorithm outputs a candidate which is O(\varepsilon)-close to the  distribution. The algorithm requires O(\logN/\varepsilon^2) samples from the unknown distribution and O(N \log N/\varepsilon^2) time, which improves previous such results (such as the Scheffé estimator) from a quadratic dependence of the running time on N to quasilinear. Given the wide use of such results for the purpose of hypothesis selection, our improved algorithm implies immediate improvements to any such use.

RIS

TY  - CPAPER
TI  - Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians
AU  - Constantinos Daskalakis
AU  - Gautam Kamath
BT  - Proceedings of The 27th Conference on Learning Theory
DA  - 2014/05/29
ED  - Maria Florina Balcan
ED  - Vitaly Feldman
ED  - Csaba Szepesvári	
ID  - pmlr-v35-daskalakis14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 35
SP  - 1183
EP  - 1213
L1  - http://proceedings.mlr.press/v35/daskalakis14.pdf
UR  - https://proceedings.mlr.press/v35/daskalakis14.html
AB  - We provide an algorithm for properly learning mixtures of two single-dimensional Gaussians without any separability assumptions. Given \tildeO(1/\varepsilon^2) samples from an unknown mixture, our algorithm outputs a mixture that is \varepsilon-close in total variation distance, in time \tildeO(1/\varepsilon^5). Our sample complexity is optimal up to logarithmic factors, and significantly improves upon both Kalai et al., whose algorithm has a prohibitive dependence on 1/\varepsilon, and Feldman et al., whose algorithm requires bounds on the mixture parameters and depends pseudo-polynomially in these parameters. One of our main contributions is an improved and generalized algorithm for selecting a good candidate distribution from among competing hypotheses. Namely, given a collection of N hypotheses containing at least one candidate that is \varepsilon-close to an unknown distribution, our algorithm outputs a candidate which is O(\varepsilon)-close to the  distribution. The algorithm requires O(\logN/\varepsilon^2) samples from the unknown distribution and O(N \log N/\varepsilon^2) time, which improves previous such results (such as the Scheffé estimator) from a quadratic dependence of the running time on N to quasilinear. Given the wide use of such results for the purpose of hypothesis selection, our improved algorithm implies immediate improvements to any such use. 
ER  -

APA

Daskalakis, C. & Kamath, G.. (2014). Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians. Proceedings of The 27th Conference on Learning Theory, in Proceedings of Machine Learning Research 35:1183-1213 Available from https://proceedings.mlr.press/v35/daskalakis14.html.

Related Material

Download PDF