Near-Interpolators: Rapid Norm Growth and the Trade-Off between Interpolation and Generalization

Yutong Wang, Rishi Sonthalia, Wei Hu
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:4483-4491, 2024.

Abstract

We study the generalization capability of nearly-interpolating linear regressors: ${\beta}$’s whose training error $\tau$ is positive but small, i.e., below the noise floor. Under a random matrix theoretic assumption on the data distribution and an eigendecay assumption on the data covariance matrix ${\Sigma}$, we demonstrate that any near-interpolator exhibits rapid norm growth: for $\tau$ fixed, ${\beta}$ has squared $\ell_2$-norm $\mathbb{E}[\|{{\beta}}\|_{2}^{2}] = \Omega(n^{\alpha})$ where $n$ is the number of samples and $\alpha >1$ is the exponent of the eigendecay, i.e., $\lambda_i({\Sigma}) \sim i^{-\alpha}$. This implies that existing data-independent norm-based bounds are necessarily loose. On the other hand, in the same regime we precisely characterize the asymptotic trade-off between interpolation and generalization. Our characterization reveals that larger norm scaling exponents $\alpha$ correspond to worse trade-offs between interpolation and generalization. We verify empirically that a similar phenomenon holds for nearly-interpolating shallow neural networks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-wang24l, title = {Near-Interpolators: Rapid Norm Growth and the Trade-Off between Interpolation and Generalization}, author = {Wang, Yutong and Sonthalia, Rishi and Hu, Wei}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {4483--4491}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/wang24l/wang24l.pdf}, url = {https://proceedings.mlr.press/v238/wang24l.html}, abstract = {We study the generalization capability of nearly-interpolating linear regressors: ${\beta}$’s whose training error $\tau$ is positive but small, i.e., below the noise floor. Under a random matrix theoretic assumption on the data distribution and an eigendecay assumption on the data covariance matrix ${\Sigma}$, we demonstrate that any near-interpolator exhibits rapid norm growth: for $\tau$ fixed, ${\beta}$ has squared $\ell_2$-norm $\mathbb{E}[\|{{\beta}}\|_{2}^{2}] = \Omega(n^{\alpha})$ where $n$ is the number of samples and $\alpha >1$ is the exponent of the eigendecay, i.e., $\lambda_i({\Sigma}) \sim i^{-\alpha}$. This implies that existing data-independent norm-based bounds are necessarily loose. On the other hand, in the same regime we precisely characterize the asymptotic trade-off between interpolation and generalization. Our characterization reveals that larger norm scaling exponents $\alpha$ correspond to worse trade-offs between interpolation and generalization. We verify empirically that a similar phenomenon holds for nearly-interpolating shallow neural networks.} }
Endnote
%0 Conference Paper %T Near-Interpolators: Rapid Norm Growth and the Trade-Off between Interpolation and Generalization %A Yutong Wang %A Rishi Sonthalia %A Wei Hu %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-wang24l %I PMLR %P 4483--4491 %U https://proceedings.mlr.press/v238/wang24l.html %V 238 %X We study the generalization capability of nearly-interpolating linear regressors: ${\beta}$’s whose training error $\tau$ is positive but small, i.e., below the noise floor. Under a random matrix theoretic assumption on the data distribution and an eigendecay assumption on the data covariance matrix ${\Sigma}$, we demonstrate that any near-interpolator exhibits rapid norm growth: for $\tau$ fixed, ${\beta}$ has squared $\ell_2$-norm $\mathbb{E}[\|{{\beta}}\|_{2}^{2}] = \Omega(n^{\alpha})$ where $n$ is the number of samples and $\alpha >1$ is the exponent of the eigendecay, i.e., $\lambda_i({\Sigma}) \sim i^{-\alpha}$. This implies that existing data-independent norm-based bounds are necessarily loose. On the other hand, in the same regime we precisely characterize the asymptotic trade-off between interpolation and generalization. Our characterization reveals that larger norm scaling exponents $\alpha$ correspond to worse trade-offs between interpolation and generalization. We verify empirically that a similar phenomenon holds for nearly-interpolating shallow neural networks.
APA
Wang, Y., Sonthalia, R. & Hu, W.. (2024). Near-Interpolators: Rapid Norm Growth and the Trade-Off between Interpolation and Generalization. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:4483-4491 Available from https://proceedings.mlr.press/v238/wang24l.html.

Related Material