MONK Outlier-Robust Mean Embedding Estimation by Median-of-Means

Matthieu Lerasle, Zoltan Szabo, Timothée Mathieu, Guillaume Lecue
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:3782-3793, 2019.

Abstract

Mean embeddings provide an extremely flexible and powerful tool in machine learning and statistics to represent probability distributions and define a semi-metric (MMD, maximum mean discrepancy; also called N-distance or energy distance), with numerous successful applications. The representation is constructed as the expectation of the feature map defined by a kernel. As a mean, its classical empirical estimator, however, can be arbitrary severely affected even by a single outlier in case of unbounded features. To the best of our knowledge, unfortunately even the consistency of the existing few techniques trying to alleviate this serious sensitivity bottleneck is unknown. In this paper, we show how the recently emerged principle of median-of-means can be used to design estimators for kernel mean embedding and MMD with excessive resistance properties to outliers, and optimal sub-Gaussian deviation bounds under mild assumptions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-lerasle19a, title = {{MONK} Outlier-Robust Mean Embedding Estimation by Median-of-Means}, author = {Lerasle, Matthieu and Szabo, Zoltan and Mathieu, Timoth{\'e}e and Lecue, Guillaume}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {3782--3793}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/lerasle19a/lerasle19a.pdf}, url = {https://proceedings.mlr.press/v97/lerasle19a.html}, abstract = {Mean embeddings provide an extremely flexible and powerful tool in machine learning and statistics to represent probability distributions and define a semi-metric (MMD, maximum mean discrepancy; also called N-distance or energy distance), with numerous successful applications. The representation is constructed as the expectation of the feature map defined by a kernel. As a mean, its classical empirical estimator, however, can be arbitrary severely affected even by a single outlier in case of unbounded features. To the best of our knowledge, unfortunately even the consistency of the existing few techniques trying to alleviate this serious sensitivity bottleneck is unknown. In this paper, we show how the recently emerged principle of median-of-means can be used to design estimators for kernel mean embedding and MMD with excessive resistance properties to outliers, and optimal sub-Gaussian deviation bounds under mild assumptions.} }
Endnote
%0 Conference Paper %T MONK Outlier-Robust Mean Embedding Estimation by Median-of-Means %A Matthieu Lerasle %A Zoltan Szabo %A Timothée Mathieu %A Guillaume Lecue %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-lerasle19a %I PMLR %P 3782--3793 %U https://proceedings.mlr.press/v97/lerasle19a.html %V 97 %X Mean embeddings provide an extremely flexible and powerful tool in machine learning and statistics to represent probability distributions and define a semi-metric (MMD, maximum mean discrepancy; also called N-distance or energy distance), with numerous successful applications. The representation is constructed as the expectation of the feature map defined by a kernel. As a mean, its classical empirical estimator, however, can be arbitrary severely affected even by a single outlier in case of unbounded features. To the best of our knowledge, unfortunately even the consistency of the existing few techniques trying to alleviate this serious sensitivity bottleneck is unknown. In this paper, we show how the recently emerged principle of median-of-means can be used to design estimators for kernel mean embedding and MMD with excessive resistance properties to outliers, and optimal sub-Gaussian deviation bounds under mild assumptions.
APA
Lerasle, M., Szabo, Z., Mathieu, T. & Lecue, G.. (2019). MONK Outlier-Robust Mean Embedding Estimation by Median-of-Means. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:3782-3793 Available from https://proceedings.mlr.press/v97/lerasle19a.html.

Related Material