[edit]
Robust Sparse Mean Estimation via Sum of Squares
Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:4703-4763, 2022.
Abstract
We study the problem of high-dimensional sparse mean estimation in the presence of an ϵ-fraction of adversarial outliers. Prior work obtained sample and computationally efficient algorithms for this task for identity-covariance subgaussian distributions. In this work, we develop the first efficient algorithms for robust sparse mean estimation without a priori knowledge of the covariance. For distributions on Rd with ‘certifiably bounded’ t-th moments and sufficiently light tails, our algorithm achieves error of O(ϵ1−1/t) with sample complexity m=(klog(d))O(t)/ϵ2−2/t. For the special case of the Gaussian distribution, our algorithm achieves near-optimal error of ˜O(ϵ) with sample complexity m=O(k4polylog(d))/ϵ2. Our algorithms follow the Sum-of-Squares based proofs to algorithms approach. We complement our upper bounds with Statistical Query and low-degree polynomial testing lower bounds, providing evidence that the sample-time-error tradeoffs achieved by our algorithms are qualitatively best possible.