Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks

Blake Bordelon; Abdulkadir Canatar; Cengiz Pehlevan

Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks

Blake Bordelon, Abdulkadir Canatar, Cengiz Pehlevan

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:1024-1034, 2020.

Abstract

We derive analytical expressions for the generalization performance of kernel regression as a function of the number of training samples using theoretical methods from Gaussian processes and statistical physics. Our expressions apply to wide neural networks due to an equivalence between training them and kernel regression with the Neural Tangent Kernel (NTK). By computing the decomposition of the total generalization error due to different spectral components of the kernel, we identify a new spectral principle: as the size of the training set grows, kernel machines and neural networks fit successively higher spectral modes of the target function. When data are sampled from a uniform distribution on a high-dimensional hypersphere, dot product kernels, including NTK, exhibit learning stages where different frequency modes of the target function are learned. We verify our theory with simulations on synthetic data and MNIST dataset.

Cite this Paper

BibTeX

@InProceedings{pmlr-v119-bordelon20a,
  title = 	 {Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks},
  author =       {Bordelon, Blake and Canatar, Abdulkadir and Pehlevan, Cengiz},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {1024--1034},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/bordelon20a/bordelon20a.pdf},
  url = 	 {https://proceedings.mlr.press/v119/bordelon20a.html},
  abstract = 	 {We derive analytical expressions for the generalization performance of kernel regression as a function of the number of training samples using theoretical methods from Gaussian processes and statistical physics. Our expressions apply to wide neural networks due to an equivalence between training them and kernel regression with the Neural Tangent Kernel (NTK). By computing the decomposition of the total generalization error due to different spectral components of the kernel, we identify a new spectral principle: as the size of the training set grows, kernel machines and neural networks fit successively higher spectral modes of the target function. When data are sampled from a uniform distribution on a high-dimensional hypersphere, dot product kernels, including NTK, exhibit learning stages where different frequency modes of the target function are learned. We verify our theory with simulations on synthetic data and MNIST dataset.}
}

Endnote

%0 Conference Paper
%T Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks
%A Blake Bordelon
%A Abdulkadir Canatar
%A Cengiz Pehlevan
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-bordelon20a
%I PMLR
%P 1024--1034
%U https://proceedings.mlr.press/v119/bordelon20a.html
%V 119
%X We derive analytical expressions for the generalization performance of kernel regression as a function of the number of training samples using theoretical methods from Gaussian processes and statistical physics. Our expressions apply to wide neural networks due to an equivalence between training them and kernel regression with the Neural Tangent Kernel (NTK). By computing the decomposition of the total generalization error due to different spectral components of the kernel, we identify a new spectral principle: as the size of the training set grows, kernel machines and neural networks fit successively higher spectral modes of the target function. When data are sampled from a uniform distribution on a high-dimensional hypersphere, dot product kernels, including NTK, exhibit learning stages where different frequency modes of the target function are learned. We verify our theory with simulations on synthetic data and MNIST dataset.

APA

Bordelon, B., Canatar, A. & Pehlevan, C.. (2020). Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:1024-1034 Available from https://proceedings.mlr.press/v119/bordelon20a.html.

Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks

Abstract

Cite this Paper

Related Material