On efficient robust regression with subquadratic samples

Deeksha Adil, Jarosław Błasiok, Hongjie Chen, Deepak Narayanan Sridharan
Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:3-74, 2026.

Abstract

We revisit the problem of robust linear regression under Gaussian covariates with an unknown covariance matrix of condition number $\kappa$. For this fundamental problem, significant gaps remain in our understanding of the trade-offs among sample complexity, condition number, runtime, and prediction error for efficient algorithms. Our first result is a near-linear-time algorithm that uses $\widetilde{O}(d/\varepsilon^4)$ samples, where $d$ is the dimension and $\varepsilon$ is the corruption rate, and achieves prediction error $O(\sqrt{\varepsilon\kappa})$ under the condition $\varepsilon\kappa \lesssim 1$, improving over all prior works. We complement this result with a Statistical Query (SQ) lower bound showing that efficient SQ algorithms achieving error $o(\sqrt{\varepsilon\kappa})$ when $\varepsilon \kappa \lesssim 1$ require queries that take $\Omega(d^2)$ samples to simulate. Finally, we prove a low-degree polynomial lower bound that gives fine-grained evidence that, without assumptions such as $\varepsilon \kappa \lesssim 1$, efficient algorithms may require $\tilde{\Omega}\left(\min{d\varepsilon^{2}\kappa^{2}, \varepsilon^{2}d^{2}}\right)$ samples to significantly outperform the trivial estimator that always guesses $0$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v336-adil26a, title = {On efficient robust regression with subquadratic samples}, author = {Adil, Deeksha and B{\l}asiok, Jaros{\l}aw and Chen, Hongjie and Sridharan, Deepak Narayanan}, booktitle = {Proceedings of Thirty Ninth Conference on Learning Theory}, pages = {3--74}, year = {2026}, editor = {Hanneke, Steve and Lattimore, Tor}, volume = {336}, series = {Proceedings of Machine Learning Research}, month = {29 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v336/main/assets/adil26a/adil26a.pdf}, url = {https://proceedings.mlr.press/v336/adil26a.html}, abstract = {We revisit the problem of robust linear regression under Gaussian covariates with an unknown covariance matrix of condition number $\kappa$. For this fundamental problem, significant gaps remain in our understanding of the trade-offs among sample complexity, condition number, runtime, and prediction error for efficient algorithms. Our first result is a near-linear-time algorithm that uses $\widetilde{O}(d/\varepsilon^4)$ samples, where $d$ is the dimension and $\varepsilon$ is the corruption rate, and achieves prediction error $O(\sqrt{\varepsilon\kappa})$ under the condition $\varepsilon\kappa \lesssim 1$, improving over all prior works. We complement this result with a Statistical Query (SQ) lower bound showing that efficient SQ algorithms achieving error $o(\sqrt{\varepsilon\kappa})$ when $\varepsilon \kappa \lesssim 1$ require queries that take $\Omega(d^2)$ samples to simulate. Finally, we prove a low-degree polynomial lower bound that gives fine-grained evidence that, without assumptions such as $\varepsilon \kappa \lesssim 1$, efficient algorithms may require $\tilde{\Omega}\left(\min{d\varepsilon^{2}\kappa^{2}, \varepsilon^{2}d^{2}}\right)$ samples to significantly outperform the trivial estimator that always guesses $0$.} }
Endnote
%0 Conference Paper %T On efficient robust regression with subquadratic samples %A Deeksha Adil %A Jarosław Błasiok %A Hongjie Chen %A Deepak Narayanan Sridharan %B Proceedings of Thirty Ninth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2026 %E Steve Hanneke %E Tor Lattimore %F pmlr-v336-adil26a %I PMLR %P 3--74 %U https://proceedings.mlr.press/v336/adil26a.html %V 336 %X We revisit the problem of robust linear regression under Gaussian covariates with an unknown covariance matrix of condition number $\kappa$. For this fundamental problem, significant gaps remain in our understanding of the trade-offs among sample complexity, condition number, runtime, and prediction error for efficient algorithms. Our first result is a near-linear-time algorithm that uses $\widetilde{O}(d/\varepsilon^4)$ samples, where $d$ is the dimension and $\varepsilon$ is the corruption rate, and achieves prediction error $O(\sqrt{\varepsilon\kappa})$ under the condition $\varepsilon\kappa \lesssim 1$, improving over all prior works. We complement this result with a Statistical Query (SQ) lower bound showing that efficient SQ algorithms achieving error $o(\sqrt{\varepsilon\kappa})$ when $\varepsilon \kappa \lesssim 1$ require queries that take $\Omega(d^2)$ samples to simulate. Finally, we prove a low-degree polynomial lower bound that gives fine-grained evidence that, without assumptions such as $\varepsilon \kappa \lesssim 1$, efficient algorithms may require $\tilde{\Omega}\left(\min{d\varepsilon^{2}\kappa^{2}, \varepsilon^{2}d^{2}}\right)$ samples to significantly outperform the trivial estimator that always guesses $0$.
APA
Adil, D., Błasiok, J., Chen, H. & Sridharan, D.N.. (2026). On efficient robust regression with subquadratic samples. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:3-74 Available from https://proceedings.mlr.press/v336/adil26a.html.

Related Material