On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models

Peizhong Ju; Xiaojun Lin; Ness Shroff

On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models

Peizhong Ju, Xiaojun Lin, Ness Shroff

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:5137-5147, 2021.

Abstract

In this paper, we study the generalization performance of min $\ell_2$-norm overfitting solutions for the neural tangent kernel (NTK) model of a two-layer neural network with ReLU activation that has no bias term. We show that, depending on the ground-truth function, the test error of overfitted NTK models exhibits characteristics that are different from the "double-descent" of other overparameterized linear models with simple Fourier or Gaussian features. Specifically, for a class of learnable functions, we provide a new upper bound of the generalization error that approaches a small limiting value, even when the number of neurons $p$ approaches infinity. This limiting value further decreases with the number of training samples $n$. For functions outside of this class, we provide a lower bound on the generalization error that does not diminish to zero even when $n$ and $p$ are both large.

Cite this Paper

BibTeX

@InProceedings{pmlr-v139-ju21a,
  title = 	 {On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models},
  author =       {Ju, Peizhong and Lin, Xiaojun and Shroff, Ness},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {5137--5147},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/ju21a/ju21a.pdf},
  url = 	 {https://proceedings.mlr.press/v139/ju21a.html},
  abstract = 	 {In this paper, we study the generalization performance of min $\ell_2$-norm overfitting solutions for the neural tangent kernel (NTK) model of a two-layer neural network with ReLU activation that has no bias term. We show that, depending on the ground-truth function, the test error of overfitted NTK models exhibits characteristics that are different from the "double-descent" of other overparameterized linear models with simple Fourier or Gaussian features. Specifically, for a class of learnable functions, we provide a new upper bound of the generalization error that approaches a small limiting value, even when the number of neurons $p$ approaches infinity. This limiting value further decreases with the number of training samples $n$. For functions outside of this class, we provide a lower bound on the generalization error that does not diminish to zero even when $n$ and $p$ are both large.}
}

Endnote

%0 Conference Paper
%T On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models
%A Peizhong Ju
%A Xiaojun Lin
%A Ness Shroff
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-ju21a
%I PMLR
%P 5137--5147
%U https://proceedings.mlr.press/v139/ju21a.html
%V 139
%X In this paper, we study the generalization performance of min $\ell_2$-norm overfitting solutions for the neural tangent kernel (NTK) model of a two-layer neural network with ReLU activation that has no bias term. We show that, depending on the ground-truth function, the test error of overfitted NTK models exhibits characteristics that are different from the "double-descent" of other overparameterized linear models with simple Fourier or Gaussian features. Specifically, for a class of learnable functions, we provide a new upper bound of the generalization error that approaches a small limiting value, even when the number of neurons $p$ approaches infinity. This limiting value further decreases with the number of training samples $n$. For functions outside of this class, we provide a lower bound on the generalization error that does not diminish to zero even when $n$ and $p$ are both large.

APA

Ju, P., Lin, X. & Shroff, N.. (2021). On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:5137-5147 Available from https://proceedings.mlr.press/v139/ju21a.html.

On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models

Abstract

Cite this Paper

Related Material