On Learning Distributions from their Samples

Sudeep Kamath; Alon Orlitsky; Dheeraj Pichapati; Ananda Theertha Suresh

On Learning Distributions from their Samples

Sudeep Kamath, Alon Orlitsky, Dheeraj Pichapati, Ananda Theertha Suresh

Proceedings of The 28th Conference on Learning Theory, PMLR 40:1066-1100, 2015.

Abstract

One of the most natural and important questions in statistical learning is: how well can a distribution be approximated from its samples. Surprisingly, this question has so far been resolved for only one loss, the KL-divergence and even in this case, the estimator used is ad hoc and not well understood. We study distribution approximations for general loss measures. For \ell_2^2 we determine the best approximation possible, for \ell_1 and χ^2 we derive tight bounds on the best approximation, and when the probabilities are bounded away from zero, we resolve the question for all sufficiently smooth loss measures, thereby providing a coherent understanding of the rate at which distributions can be approximated from their samples.

Cite this Paper

BibTeX


@InProceedings{pmlr-v40-Kamath15,
  title = 	 {On Learning Distributions from their Samples},
  author = 	 {Kamath, Sudeep and Orlitsky, Alon and Pichapati, Dheeraj and Suresh, Ananda Theertha},
  booktitle = 	 {Proceedings of The 28th Conference on Learning Theory},
  pages = 	 {1066--1100},
  year = 	 {2015},
  editor = 	 {Grünwald, Peter and Hazan, Elad and Kale, Satyen},
  volume = 	 {40},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Paris, France},
  month = 	 {03--06 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v40/Kamath15.pdf},
  url = 	 {https://proceedings.mlr.press/v40/Kamath15.html},
  abstract = 	 {One of the most natural and important questions in statistical learning is: how well can a distribution be approximated from its samples. Surprisingly, this question has so far been resolved for only one loss, the KL-divergence and even in this case, the estimator used is ad hoc and not well understood. We study distribution approximations for general loss measures.  For \ell_2^2 we determine the best approximation possible, for \ell_1 and χ^2 we derive tight bounds on the best approximation, and when the probabilities are bounded away from zero, we resolve the question for all sufficiently smooth loss measures, thereby providing a coherent understanding of the rate at which distributions can be approximated from their samples.}
}

Endnote

%0 Conference Paper
%T On Learning Distributions from their Samples
%A Sudeep Kamath
%A Alon Orlitsky
%A Dheeraj Pichapati
%A Ananda Theertha Suresh
%B Proceedings of The 28th Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2015
%E Peter Grünwald
%E Elad Hazan
%E Satyen Kale	
%F pmlr-v40-Kamath15
%I PMLR
%P 1066--1100
%U https://proceedings.mlr.press/v40/Kamath15.html
%V 40
%X One of the most natural and important questions in statistical learning is: how well can a distribution be approximated from its samples. Surprisingly, this question has so far been resolved for only one loss, the KL-divergence and even in this case, the estimator used is ad hoc and not well understood. We study distribution approximations for general loss measures.  For \ell_2^2 we determine the best approximation possible, for \ell_1 and χ^2 we derive tight bounds on the best approximation, and when the probabilities are bounded away from zero, we resolve the question for all sufficiently smooth loss measures, thereby providing a coherent understanding of the rate at which distributions can be approximated from their samples.

RIS


TY  - CPAPER
TI  - On Learning Distributions from their Samples
AU  - Sudeep Kamath
AU  - Alon Orlitsky
AU  - Dheeraj Pichapati
AU  - Ananda Theertha Suresh
BT  - Proceedings of The 28th Conference on Learning Theory
DA  - 2015/06/26
ED  - Peter Grünwald
ED  - Elad Hazan
ED  - Satyen Kale	
ID  - pmlr-v40-Kamath15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 40
SP  - 1066
EP  - 1100
L1  - http://proceedings.mlr.press/v40/Kamath15.pdf
UR  - https://proceedings.mlr.press/v40/Kamath15.html
AB  - One of the most natural and important questions in statistical learning is: how well can a distribution be approximated from its samples. Surprisingly, this question has so far been resolved for only one loss, the KL-divergence and even in this case, the estimator used is ad hoc and not well understood. We study distribution approximations for general loss measures.  For \ell_2^2 we determine the best approximation possible, for \ell_1 and χ^2 we derive tight bounds on the best approximation, and when the probabilities are bounded away from zero, we resolve the question for all sufficiently smooth loss measures, thereby providing a coherent understanding of the rate at which distributions can be approximated from their samples.
ER  -

APA


Kamath, S., Orlitsky, A., Pichapati, D. & Suresh, A.T.. (2015). On Learning Distributions from their Samples. Proceedings of The 28th Conference on Learning Theory, in Proceedings of Machine Learning Research 40:1066-1100 Available from https://proceedings.mlr.press/v40/Kamath15.html.

Related Material

Download PDF