[edit]
On efficient robust regression with subquadratic samples
Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:3-74, 2026.
Abstract
We revisit the problem of robust linear regression under Gaussian covariates with an unknown covariance matrix of condition number $\kappa$. For this fundamental problem, significant gaps remain in our understanding of the trade-offs among sample complexity, condition number, runtime, and prediction error for efficient algorithms. Our first result is a near-linear-time algorithm that uses $\widetilde{O}(d/\varepsilon^4)$ samples, where $d$ is the dimension and $\varepsilon$ is the corruption rate, and achieves prediction error $O(\sqrt{\varepsilon\kappa})$ under the condition $\varepsilon\kappa \lesssim 1$, improving over all prior works. We complement this result with a Statistical Query (SQ) lower bound showing that efficient SQ algorithms achieving error $o(\sqrt{\varepsilon\kappa})$ when $\varepsilon \kappa \lesssim 1$ require queries that take $\Omega(d^2)$ samples to simulate. Finally, we prove a low-degree polynomial lower bound that gives fine-grained evidence that, without assumptions such as $\varepsilon \kappa \lesssim 1$, efficient algorithms may require $\tilde{\Omega}\left(\min{d\varepsilon^{2}\kappa^{2}, \varepsilon^{2}d^{2}}\right)$ samples to significantly outperform the trivial estimator that always guesses $0$.