[edit]
Near-Interpolators: Rapid Norm Growth and the Trade-Off between Interpolation and Generalization
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:4483-4491, 2024.
Abstract
We study the generalization capability of nearly-interpolating linear regressors: ${\beta}$’s whose training error $\tau$ is positive but small, i.e., below the noise floor. Under a random matrix theoretic assumption on the data distribution and an eigendecay assumption on the data covariance matrix ${\Sigma}$, we demonstrate that any near-interpolator exhibits rapid norm growth: for $\tau$ fixed, ${\beta}$ has squared $\ell_2$-norm $\mathbb{E}[\|{{\beta}}\|_{2}^{2}] = \Omega(n^{\alpha})$ where $n$ is the number of samples and $\alpha >1$ is the exponent of the eigendecay, i.e., $\lambda_i({\Sigma}) \sim i^{-\alpha}$. This implies that existing data-independent norm-based bounds are necessarily loose. On the other hand, in the same regime we precisely characterize the asymptotic trade-off between interpolation and generalization. Our characterization reveals that larger norm scaling exponents $\alpha$ correspond to worse trade-offs between interpolation and generalization. We verify empirically that a similar phenomenon holds for nearly-interpolating shallow neural networks.