Robust Mean Estimation on Highly Incomplete Data with Arbitrary Outliers

Lunjia Hu, Omer Reingold
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:1558-1566, 2021.

Abstract

We study the problem of robustly estimating the mean of a $d$-dimensional distribution given $N$ examples, where most coordinates of every example may be missing and $\varepsilon N$ examples may be arbitrarily corrupted. Assuming each coordinate appears in a constant factor more than $\varepsilon N$ examples, we show algorithms that estimate the mean of the distribution with information-theoretically optimal dimension-independent error guarantees in nearly-linear time $\widetilde O(Nd)$. Our results extend recent work on computationally-efficient robust estimation to a more widely applicable incomplete-data setting.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-hu21b, title = { Robust Mean Estimation on Highly Incomplete Data with Arbitrary Outliers }, author = {Hu, Lunjia and Reingold, Omer}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {1558--1566}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/hu21b/hu21b.pdf}, url = {https://proceedings.mlr.press/v130/hu21b.html}, abstract = { We study the problem of robustly estimating the mean of a $d$-dimensional distribution given $N$ examples, where most coordinates of every example may be missing and $\varepsilon N$ examples may be arbitrarily corrupted. Assuming each coordinate appears in a constant factor more than $\varepsilon N$ examples, we show algorithms that estimate the mean of the distribution with information-theoretically optimal dimension-independent error guarantees in nearly-linear time $\widetilde O(Nd)$. Our results extend recent work on computationally-efficient robust estimation to a more widely applicable incomplete-data setting. } }
Endnote
%0 Conference Paper %T Robust Mean Estimation on Highly Incomplete Data with Arbitrary Outliers %A Lunjia Hu %A Omer Reingold %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-hu21b %I PMLR %P 1558--1566 %U https://proceedings.mlr.press/v130/hu21b.html %V 130 %X We study the problem of robustly estimating the mean of a $d$-dimensional distribution given $N$ examples, where most coordinates of every example may be missing and $\varepsilon N$ examples may be arbitrarily corrupted. Assuming each coordinate appears in a constant factor more than $\varepsilon N$ examples, we show algorithms that estimate the mean of the distribution with information-theoretically optimal dimension-independent error guarantees in nearly-linear time $\widetilde O(Nd)$. Our results extend recent work on computationally-efficient robust estimation to a more widely applicable incomplete-data setting.
APA
Hu, L. & Reingold, O.. (2021). Robust Mean Estimation on Highly Incomplete Data with Arbitrary Outliers . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:1558-1566 Available from https://proceedings.mlr.press/v130/hu21b.html.

Related Material