[edit]

# Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression

*Proceedings of the 40th International Conference on Machine Learning*, PMLR 202:42543-42573, 2023.

#### Abstract

In deep learning, often the training process finds an interpolator (a solution with 0 training loss), but the test loss is still low. This phenomenon, known as

*benign overfitting*, is a major mystery that received a lot of recent attention. One common mechanism for benign overfitting is*implicit regularization*, where the training process leads to additional properties for the interpolator, often characterized by minimizing certain norms. However, even for a simple sparse linear regression problem $y = \beta^{\ast\top} x +\xi$ with sparse $\beta^{\ast}$, neither minimum $\ell_1$ or $\ell_2$ norm interpolator gives the optimal test loss. In this work, we give a different parametrization of the model which leads to a new implicit regularization effect that combines the benefit of $\ell_1$ and $\ell_2$ interpolators. We show that training our new model via gradient descent leads to an interpolator with near-optimal test loss. Our result is based on careful analysis of the training dynamics and provides another example of implicit regularization effect that goes beyond norm minimization.