On efficient robust regression with subquadratic samples

Deeksha Adil; Jarosław Błasiok; Hongjie Chen; Deepak Narayanan Sridharan

On efficient robust regression with subquadratic samples

Deeksha Adil, Jarosław Błasiok, Hongjie Chen, Deepak Narayanan Sridharan

Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:3-74, 2026.

Abstract

We revisit the problem of robust linear regression under Gaussian covariates with an unknown covariance matrix of condition number $\kappa$. For this fundamental problem, significant gaps remain in our understanding of the trade-offs among sample complexity, condition number, runtime, and prediction error for efficient algorithms. Our first result is a near-linear-time algorithm that uses $\widetilde{O}(d/\varepsilon^4)$ samples, where $d$ is the dimension and $\varepsilon$ is the corruption rate, and achieves prediction error $O(\sqrt{\varepsilon\kappa})$ under the condition $\varepsilon\kappa \lesssim 1$, improving over all prior works. We complement this result with a Statistical Query (SQ) lower bound showing that efficient SQ algorithms achieving error $o(\sqrt{\varepsilon\kappa})$ when $\varepsilon \kappa \lesssim 1$ require queries that take $\Omega(d^2)$ samples to simulate. Finally, we prove a low-degree polynomial lower bound that gives fine-grained evidence that, without assumptions such as $\varepsilon \kappa \lesssim 1$, efficient algorithms may require $\tilde{\Omega}\left(\min{d\varepsilon^{2}\kappa^{2}, \varepsilon^{2}d^{2}}\right)$ samples to significantly outperform the trivial estimator that always guesses $0$.

Cite this Paper

BibTeX

@InProceedings{pmlr-v336-adil26a,
  title = 	 {On efficient robust regression with subquadratic samples},
  author =       {Adil, Deeksha and B{\l}asiok, Jaros{\l}aw and Chen, Hongjie and Sridharan, Deepak Narayanan},
  booktitle = 	 {Proceedings of Thirty Ninth Conference on Learning Theory},
  pages = 	 {3--74},
  year = 	 {2026},
  editor = 	 {Hanneke, Steve and Lattimore, Tor},
  volume = 	 {336},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {29 Jun--03 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v336/main/assets/adil26a/adil26a.pdf},
  url = 	 {https://proceedings.mlr.press/v336/adil26a.html},
  abstract = 	 {We revisit the problem of robust linear regression under Gaussian covariates with an unknown covariance matrix of condition number $\kappa$. For this fundamental problem, significant gaps remain in our understanding of the trade-offs among sample complexity, condition number, runtime, and prediction error for efficient algorithms. Our first result is a near-linear-time algorithm that uses $\widetilde{O}(d/\varepsilon^4)$ samples, where $d$ is the dimension and $\varepsilon$ is the corruption rate, and achieves prediction error $O(\sqrt{\varepsilon\kappa})$ under the condition $\varepsilon\kappa \lesssim 1$, improving over all prior works. We complement this result with a Statistical Query (SQ) lower bound showing that efficient SQ algorithms achieving error $o(\sqrt{\varepsilon\kappa})$ when $\varepsilon \kappa \lesssim 1$ require queries that take $\Omega(d^2)$ samples to simulate. Finally, we prove a low-degree polynomial lower bound that gives fine-grained evidence that, without assumptions such as $\varepsilon \kappa \lesssim 1$, efficient algorithms may require $\tilde{\Omega}\left(\min{d\varepsilon^{2}\kappa^{2}, \varepsilon^{2}d^{2}}\right)$ samples to significantly outperform the trivial estimator that always guesses $0$.}
}

Endnote

%0 Conference Paper
%T On efficient robust regression with subquadratic samples
%A Deeksha Adil
%A Jarosław Błasiok
%A Hongjie Chen
%A Deepak Narayanan Sridharan
%B Proceedings of Thirty Ninth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2026
%E Steve Hanneke
%E Tor Lattimore	
%F pmlr-v336-adil26a
%I PMLR
%P 3--74
%U https://proceedings.mlr.press/v336/adil26a.html
%V 336
%X We revisit the problem of robust linear regression under Gaussian covariates with an unknown covariance matrix of condition number $\kappa$. For this fundamental problem, significant gaps remain in our understanding of the trade-offs among sample complexity, condition number, runtime, and prediction error for efficient algorithms. Our first result is a near-linear-time algorithm that uses $\widetilde{O}(d/\varepsilon^4)$ samples, where $d$ is the dimension and $\varepsilon$ is the corruption rate, and achieves prediction error $O(\sqrt{\varepsilon\kappa})$ under the condition $\varepsilon\kappa \lesssim 1$, improving over all prior works. We complement this result with a Statistical Query (SQ) lower bound showing that efficient SQ algorithms achieving error $o(\sqrt{\varepsilon\kappa})$ when $\varepsilon \kappa \lesssim 1$ require queries that take $\Omega(d^2)$ samples to simulate. Finally, we prove a low-degree polynomial lower bound that gives fine-grained evidence that, without assumptions such as $\varepsilon \kappa \lesssim 1$, efficient algorithms may require $\tilde{\Omega}\left(\min{d\varepsilon^{2}\kappa^{2}, \varepsilon^{2}d^{2}}\right)$ samples to significantly outperform the trivial estimator that always guesses $0$.

APA

Adil, D., Błasiok, J., Chen, H. & Sridharan, D.N.. (2026). On efficient robust regression with subquadratic samples. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:3-74 Available from https://proceedings.mlr.press/v336/adil26a.html.

Related Material

Download PDF