When OT meets MoM: Robust estimation of Wasserstein Distance

Guillaume Staerman, Pierre Laforgue, Pavlo Mozharovskyi, Florence d’Alché-Buc
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:136-144, 2021.

Abstract

Originated from Optimal Transport, the Wasserstein distance has gained importance in Machine Learning due to its appealing geometrical properties and the increasing availability of efficient approximations. It owes its recent ubiquity in generative modelling and variational inference to its ability to cope with distributions having non overlapping support. In this work, we consider the problem of estimating the Wasserstein distance between two probability distributions when observations are polluted by outliers. To that end, we investigate how to leverage a Medians of Means (MoM) approach to provide robust estimates. Exploiting the dual Kantorovitch formulation of the Wasserstein distance, we introduce and discuss novel MoM-based robust estimators whose consistency is studied under a data contamination model and for which convergence rates are provided. Beyond computational issues, the choice of the partition size, i.e., the unique parameter of theses robust estimators, is investigated in numerical experiments. Furthermore, these MoM estimators make Wasserstein Generative Adversarial Network (WGAN) robust to outliers, as witnessed by an empirical study on two benchmarks CIFAR10 and Fashion MNIST.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-staerman21a, title = { When OT meets MoM: Robust estimation of Wasserstein Distance }, author = {Staerman, Guillaume and Laforgue, Pierre and Mozharovskyi, Pavlo and d'Alch{\'e}-Buc, Florence}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {136--144}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/staerman21a/staerman21a.pdf}, url = {https://proceedings.mlr.press/v130/staerman21a.html}, abstract = { Originated from Optimal Transport, the Wasserstein distance has gained importance in Machine Learning due to its appealing geometrical properties and the increasing availability of efficient approximations. It owes its recent ubiquity in generative modelling and variational inference to its ability to cope with distributions having non overlapping support. In this work, we consider the problem of estimating the Wasserstein distance between two probability distributions when observations are polluted by outliers. To that end, we investigate how to leverage a Medians of Means (MoM) approach to provide robust estimates. Exploiting the dual Kantorovitch formulation of the Wasserstein distance, we introduce and discuss novel MoM-based robust estimators whose consistency is studied under a data contamination model and for which convergence rates are provided. Beyond computational issues, the choice of the partition size, i.e., the unique parameter of theses robust estimators, is investigated in numerical experiments. Furthermore, these MoM estimators make Wasserstein Generative Adversarial Network (WGAN) robust to outliers, as witnessed by an empirical study on two benchmarks CIFAR10 and Fashion MNIST. } }
Endnote
%0 Conference Paper %T When OT meets MoM: Robust estimation of Wasserstein Distance %A Guillaume Staerman %A Pierre Laforgue %A Pavlo Mozharovskyi %A Florence d’Alché-Buc %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-staerman21a %I PMLR %P 136--144 %U https://proceedings.mlr.press/v130/staerman21a.html %V 130 %X Originated from Optimal Transport, the Wasserstein distance has gained importance in Machine Learning due to its appealing geometrical properties and the increasing availability of efficient approximations. It owes its recent ubiquity in generative modelling and variational inference to its ability to cope with distributions having non overlapping support. In this work, we consider the problem of estimating the Wasserstein distance between two probability distributions when observations are polluted by outliers. To that end, we investigate how to leverage a Medians of Means (MoM) approach to provide robust estimates. Exploiting the dual Kantorovitch formulation of the Wasserstein distance, we introduce and discuss novel MoM-based robust estimators whose consistency is studied under a data contamination model and for which convergence rates are provided. Beyond computational issues, the choice of the partition size, i.e., the unique parameter of theses robust estimators, is investigated in numerical experiments. Furthermore, these MoM estimators make Wasserstein Generative Adversarial Network (WGAN) robust to outliers, as witnessed by an empirical study on two benchmarks CIFAR10 and Fashion MNIST.
APA
Staerman, G., Laforgue, P., Mozharovskyi, P. & d’Alché-Buc, F.. (2021). When OT meets MoM: Robust estimation of Wasserstein Distance . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:136-144 Available from https://proceedings.mlr.press/v130/staerman21a.html.

Related Material