[edit]
Robust Mean Estimation on Highly Incomplete Data with Arbitrary Outliers
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:1558-1566, 2021.
Abstract
We study the problem of robustly estimating the mean of a d-dimensional distribution given N examples, where most coordinates of every example may be missing and εN examples may be arbitrarily corrupted. Assuming each coordinate appears in a constant factor more than εN examples, we show algorithms that estimate the mean of the distribution with information-theoretically optimal dimension-independent error guarantees in nearly-linear time ˜O(Nd). Our results extend recent work on computationally-efficient robust estimation to a more widely applicable incomplete-data setting.