Memorize to generalize: on the necessity of interpolation in high dimensional linear regression

Chen Cheng; John Duchi; Rohith Kuditipudi

Memorize to generalize: on the necessity of interpolation in high dimensional linear regression

Chen Cheng, John Duchi, Rohith Kuditipudi

Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:5528-5560, 2022.

Abstract

We examine the necessity of interpolation in overparameterized models, that is, when achieving optimal predictive risk in machine learning problems requires (nearly) interpolating the training data. In particular, we consider simple overparameterized linear regression

$y = X \theta + w$ with random design

$X \in \real^{n \times d}$ under the proportional asymptotics

$d/n \to \gamma \in (1, \infty)$ . We precisely characterize how prediction (test) error necessarily scales with training error in this setting. An implication of this characterization is that as the label noise variance

$\sigma^2 \to 0$ , any estimator that incurs at least

$\mathsf{c}\sigma^4$ training error for some constant

$\mathsf{c}$ is necessarily suboptimal and will suffer growth in excess prediction error at least linear in the training error. Thus, optimal performance requires fitting training data to substantially higher accuracy than the inherent noise floor of the problem.

Cite this Paper

BibTeX


@InProceedings{pmlr-v178-cheng22a,
  title = 	 {Memorize to generalize: on the necessity of interpolation in high dimensional linear regression},
  author =       {Cheng, Chen and Duchi, John and Kuditipudi, Rohith},
  booktitle = 	 {Proceedings of Thirty Fifth Conference on Learning Theory},
  pages = 	 {5528--5560},
  year = 	 {2022},
  editor = 	 {Loh, Po-Ling and Raginsky, Maxim},
  volume = 	 {178},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--05 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v178/cheng22a/cheng22a.pdf},
  url = 	 {https://proceedings.mlr.press/v178/cheng22a.html},
  abstract = 	 {We examine the necessity of interpolation in overparameterized models, that is, when achieving optimal predictive risk in machine learning problems requires (nearly) interpolating the training data. In particular, we consider simple overparameterized linear regression $y = X \theta + w$ with random design $X \in \real^{n \times d}$ under the proportional asymptotics $d/n \to \gamma \in (1, \infty)$.  We precisely characterize how prediction (test) error necessarily scales with training error in this setting.  An implication of this characterization is that as the label noise variance $\sigma^2 \to 0$, any estimator that incurs at least $\mathsf{c}\sigma^4$ training error for some constant $\mathsf{c}$ is necessarily suboptimal and  will suffer growth in excess prediction error at least linear in the training error. Thus, optimal performance requires fitting training data to substantially higher accuracy than the inherent noise floor of the problem.}
}

Endnote

%0 Conference Paper
%T Memorize to generalize: on the necessity of interpolation in high dimensional linear regression
%A Chen Cheng
%A John Duchi
%A Rohith Kuditipudi
%B Proceedings of Thirty Fifth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2022
%E Po-Ling Loh
%E Maxim Raginsky	
%F pmlr-v178-cheng22a
%I PMLR
%P 5528--5560
%U https://proceedings.mlr.press/v178/cheng22a.html
%V 178
%X We examine the necessity of interpolation in overparameterized models, that is, when achieving optimal predictive risk in machine learning problems requires (nearly) interpolating the training data. In particular, we consider simple overparameterized linear regression $y = X \theta + w$ with random design $X \in \real^{n \times d}$ under the proportional asymptotics $d/n \to \gamma \in (1, \infty)$.  We precisely characterize how prediction (test) error necessarily scales with training error in this setting.  An implication of this characterization is that as the label noise variance $\sigma^2 \to 0$, any estimator that incurs at least $\mathsf{c}\sigma^4$ training error for some constant $\mathsf{c}$ is necessarily suboptimal and  will suffer growth in excess prediction error at least linear in the training error. Thus, optimal performance requires fitting training data to substantially higher accuracy than the inherent noise floor of the problem.

APA


Cheng, C., Duchi, J. & Kuditipudi, R.. (2022). Memorize to generalize: on the necessity of interpolation in high dimensional linear regression. Proceedings of Thirty Fifth Conference on Learning Theory, in Proceedings of Machine Learning Research 178:5528-5560 Available from https://proceedings.mlr.press/v178/cheng22a.html.

Related Material

Download PDF