Probability Inequalities for Kernel Embeddings in Sampling without Replacement

Markus Schneider

Probability Inequalities for Kernel Embeddings in Sampling without Replacement

Markus Schneider

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:66-74, 2016.

Abstract

The \emphkernel embedding of distributions is a popular machine learning technique to manipulate probability distributions and an integral part of numerous applications. Its empirical counterpart is an estimate from a finite dataset of samples from the distribution under consideration. However, for large-scale learning problems the empirical kernel embedding becomes infeasible to compute and approximate, constant time, solutions are necessary. Instead of the full dataset, a random subset of smaller size can be used to calculate the empirical kernel embedding, known as \emphsampling without replacement. In this work we generalize the results of (Serfling 1974) to quantify the difference between this two estimates. We derive probability inequalities for the kernel embedding and more general inequalities for Banach space valued martingales in the setting of sampling without replacement.

Cite this Paper

BibTeX


@InProceedings{pmlr-v51-schneider16,
  title = 	 {Probability Inequalities for Kernel Embeddings in Sampling without Replacement},
  author = 	 {Schneider, Markus},
  booktitle = 	 {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {66--74},
  year = 	 {2016},
  editor = 	 {Gretton, Arthur and Robert, Christian C.},
  volume = 	 {51},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Cadiz, Spain},
  month = 	 {09--11 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v51/schneider16.pdf},
  url = 	 {https://proceedings.mlr.press/v51/schneider16.html},
  abstract = 	 {The \emphkernel embedding of distributions is a popular machine learning technique to manipulate probability distributions and an integral part of numerous applications. Its empirical counterpart is an estimate from a finite dataset of samples from the distribution under consideration. However, for large-scale learning problems the empirical kernel embedding becomes infeasible to compute and approximate, constant time, solutions are necessary. Instead of the full dataset, a random subset of smaller size can be used to calculate the empirical kernel embedding, known as \emphsampling without replacement. In this work we generalize the results of (Serfling 1974) to quantify the difference between this two estimates. We derive probability inequalities for the kernel embedding and more general inequalities for Banach space valued martingales in the setting of sampling without replacement.}
}

Endnote

%0 Conference Paper
%T Probability Inequalities for Kernel Embeddings in Sampling without Replacement
%A Markus Schneider
%B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2016
%E Arthur Gretton
%E Christian C. Robert	
%F pmlr-v51-schneider16
%I PMLR
%P 66--74
%U https://proceedings.mlr.press/v51/schneider16.html
%V 51
%X The \emphkernel embedding of distributions is a popular machine learning technique to manipulate probability distributions and an integral part of numerous applications. Its empirical counterpart is an estimate from a finite dataset of samples from the distribution under consideration. However, for large-scale learning problems the empirical kernel embedding becomes infeasible to compute and approximate, constant time, solutions are necessary. Instead of the full dataset, a random subset of smaller size can be used to calculate the empirical kernel embedding, known as \emphsampling without replacement. In this work we generalize the results of (Serfling 1974) to quantify the difference between this two estimates. We derive probability inequalities for the kernel embedding and more general inequalities for Banach space valued martingales in the setting of sampling without replacement.

RIS


TY  - CPAPER
TI  - Probability Inequalities for Kernel Embeddings in Sampling without Replacement
AU  - Markus Schneider
BT  - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
DA  - 2016/05/02
ED  - Arthur Gretton
ED  - Christian C. Robert	
ID  - pmlr-v51-schneider16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 51
SP  - 66
EP  - 74
L1  - http://proceedings.mlr.press/v51/schneider16.pdf
UR  - https://proceedings.mlr.press/v51/schneider16.html
AB  - The \emphkernel embedding of distributions is a popular machine learning technique to manipulate probability distributions and an integral part of numerous applications. Its empirical counterpart is an estimate from a finite dataset of samples from the distribution under consideration. However, for large-scale learning problems the empirical kernel embedding becomes infeasible to compute and approximate, constant time, solutions are necessary. Instead of the full dataset, a random subset of smaller size can be used to calculate the empirical kernel embedding, known as \emphsampling without replacement. In this work we generalize the results of (Serfling 1974) to quantify the difference between this two estimates. We derive probability inequalities for the kernel embedding and more general inequalities for Banach space valued martingales in the setting of sampling without replacement.
ER  -

APA


Schneider, M.. (2016). Probability Inequalities for Kernel Embeddings in Sampling without Replacement. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:66-74 Available from https://proceedings.mlr.press/v51/schneider16.html.

Probability Inequalities for Kernel Embeddings in Sampling without Replacement

Abstract

Cite this Paper

Related Material