[edit]
Risk Comparisons in Linear Regression: Implicit Regularization Dominates Explicit Regularization (Extended Abstract)
Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:6849-6851, 2026.
Abstract
Existing theory suggests that for linear regression problems categorized by capacity and source conditions, \emph{gradient descent} (GD) is always minimax optimal, while both \emph{ridge regression} and online \emph{stochastic gradient descent} (SGD) are polynomially suboptimal for certain categories of such problems. Moving beyond minimax theory, this work provides \emph{instance-wise} comparisons of the finite-sample risks for these algorithms on any well-specified linear regression problem. Our analysis yields three key findings. First, GD \emph{dominates} ridge regression: with comparable regularization, the excess risk of GD is \emph{always} within a constant factor of ridge, but ridge can be \emph{polynomially} worse even when tuned optimally. Second, GD is \emph{incomparable} with SGD. While it is known that for certain problems GD can be polynomially better than SGD, the reverse is also true: we construct problems, inspired by \emph{benign overfitting} theory, where optimally stopped GD is polynomially worse. Finally, GD dominates SGD for a significant subclass of problems—those with fast and continuously decaying covariance spectra—which includes all problems satisfying the standard capacity condition.