On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models

Peizhong Ju, Xiaojun Lin, Ness Shroff
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:5137-5147, 2021.

Abstract

In this paper, we study the generalization performance of min $\ell_2$-norm overfitting solutions for the neural tangent kernel (NTK) model of a two-layer neural network with ReLU activation that has no bias term. We show that, depending on the ground-truth function, the test error of overfitted NTK models exhibits characteristics that are different from the "double-descent" of other overparameterized linear models with simple Fourier or Gaussian features. Specifically, for a class of learnable functions, we provide a new upper bound of the generalization error that approaches a small limiting value, even when the number of neurons $p$ approaches infinity. This limiting value further decreases with the number of training samples $n$. For functions outside of this class, we provide a lower bound on the generalization error that does not diminish to zero even when $n$ and $p$ are both large.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-ju21a, title = {On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models}, author = {Ju, Peizhong and Lin, Xiaojun and Shroff, Ness}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {5137--5147}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/ju21a/ju21a.pdf}, url = {https://proceedings.mlr.press/v139/ju21a.html}, abstract = {In this paper, we study the generalization performance of min $\ell_2$-norm overfitting solutions for the neural tangent kernel (NTK) model of a two-layer neural network with ReLU activation that has no bias term. We show that, depending on the ground-truth function, the test error of overfitted NTK models exhibits characteristics that are different from the "double-descent" of other overparameterized linear models with simple Fourier or Gaussian features. Specifically, for a class of learnable functions, we provide a new upper bound of the generalization error that approaches a small limiting value, even when the number of neurons $p$ approaches infinity. This limiting value further decreases with the number of training samples $n$. For functions outside of this class, we provide a lower bound on the generalization error that does not diminish to zero even when $n$ and $p$ are both large.} }
Endnote
%0 Conference Paper %T On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models %A Peizhong Ju %A Xiaojun Lin %A Ness Shroff %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-ju21a %I PMLR %P 5137--5147 %U https://proceedings.mlr.press/v139/ju21a.html %V 139 %X In this paper, we study the generalization performance of min $\ell_2$-norm overfitting solutions for the neural tangent kernel (NTK) model of a two-layer neural network with ReLU activation that has no bias term. We show that, depending on the ground-truth function, the test error of overfitted NTK models exhibits characteristics that are different from the "double-descent" of other overparameterized linear models with simple Fourier or Gaussian features. Specifically, for a class of learnable functions, we provide a new upper bound of the generalization error that approaches a small limiting value, even when the number of neurons $p$ approaches infinity. This limiting value further decreases with the number of training samples $n$. For functions outside of this class, we provide a lower bound on the generalization error that does not diminish to zero even when $n$ and $p$ are both large.
APA
Ju, P., Lin, X. & Shroff, N.. (2021). On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:5137-5147 Available from https://proceedings.mlr.press/v139/ju21a.html.

Related Material