Probability Inequalities for Kernel Embeddings in Sampling without Replacement

Markus Schneider
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:66-74, 2016.

Abstract

The \emphkernel embedding of distributions is a popular machine learning technique to manipulate probability distributions and an integral part of numerous applications. Its empirical counterpart is an estimate from a finite dataset of samples from the distribution under consideration. However, for large-scale learning problems the empirical kernel embedding becomes infeasible to compute and approximate, constant time, solutions are necessary. Instead of the full dataset, a random subset of smaller size can be used to calculate the empirical kernel embedding, known as \emphsampling without replacement. In this work we generalize the results of (Serfling 1974) to quantify the difference between this two estimates. We derive probability inequalities for the kernel embedding and more general inequalities for Banach space valued martingales in the setting of sampling without replacement.

Cite this Paper


BibTeX
@InProceedings{pmlr-v51-schneider16, title = {Probability Inequalities for Kernel Embeddings in Sampling without Replacement}, author = {Schneider, Markus}, booktitle = {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics}, pages = {66--74}, year = {2016}, editor = {Gretton, Arthur and Robert, Christian C.}, volume = {51}, series = {Proceedings of Machine Learning Research}, address = {Cadiz, Spain}, month = {09--11 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v51/schneider16.pdf}, url = {https://proceedings.mlr.press/v51/schneider16.html}, abstract = {The \emphkernel embedding of distributions is a popular machine learning technique to manipulate probability distributions and an integral part of numerous applications. Its empirical counterpart is an estimate from a finite dataset of samples from the distribution under consideration. However, for large-scale learning problems the empirical kernel embedding becomes infeasible to compute and approximate, constant time, solutions are necessary. Instead of the full dataset, a random subset of smaller size can be used to calculate the empirical kernel embedding, known as \emphsampling without replacement. In this work we generalize the results of (Serfling 1974) to quantify the difference between this two estimates. We derive probability inequalities for the kernel embedding and more general inequalities for Banach space valued martingales in the setting of sampling without replacement.} }
Endnote
%0 Conference Paper %T Probability Inequalities for Kernel Embeddings in Sampling without Replacement %A Markus Schneider %B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2016 %E Arthur Gretton %E Christian C. Robert %F pmlr-v51-schneider16 %I PMLR %P 66--74 %U https://proceedings.mlr.press/v51/schneider16.html %V 51 %X The \emphkernel embedding of distributions is a popular machine learning technique to manipulate probability distributions and an integral part of numerous applications. Its empirical counterpart is an estimate from a finite dataset of samples from the distribution under consideration. However, for large-scale learning problems the empirical kernel embedding becomes infeasible to compute and approximate, constant time, solutions are necessary. Instead of the full dataset, a random subset of smaller size can be used to calculate the empirical kernel embedding, known as \emphsampling without replacement. In this work we generalize the results of (Serfling 1974) to quantify the difference between this two estimates. We derive probability inequalities for the kernel embedding and more general inequalities for Banach space valued martingales in the setting of sampling without replacement.
RIS
TY - CPAPER TI - Probability Inequalities for Kernel Embeddings in Sampling without Replacement AU - Markus Schneider BT - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics DA - 2016/05/02 ED - Arthur Gretton ED - Christian C. Robert ID - pmlr-v51-schneider16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 51 SP - 66 EP - 74 L1 - http://proceedings.mlr.press/v51/schneider16.pdf UR - https://proceedings.mlr.press/v51/schneider16.html AB - The \emphkernel embedding of distributions is a popular machine learning technique to manipulate probability distributions and an integral part of numerous applications. Its empirical counterpart is an estimate from a finite dataset of samples from the distribution under consideration. However, for large-scale learning problems the empirical kernel embedding becomes infeasible to compute and approximate, constant time, solutions are necessary. Instead of the full dataset, a random subset of smaller size can be used to calculate the empirical kernel embedding, known as \emphsampling without replacement. In this work we generalize the results of (Serfling 1974) to quantify the difference between this two estimates. We derive probability inequalities for the kernel embedding and more general inequalities for Banach space valued martingales in the setting of sampling without replacement. ER -
APA
Schneider, M.. (2016). Probability Inequalities for Kernel Embeddings in Sampling without Replacement. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:66-74 Available from https://proceedings.mlr.press/v51/schneider16.html.

Related Material