Approximation beats concentration? An approximation view  on inference with smooth radial kernels

Mikhail Belkin

Approximation beats concentration? An approximation view on inference with smooth radial kernels

Mikhail Belkin

Proceedings of the 31st Conference On Learning Theory, PMLR 75:1348-1361, 2018.

Abstract

Positive definite kernels and their associated Reproducing Kernel Hilbert Spaces provide a mathematically compelling and practically competitive framework for learning from data. In this paper we take the approximation theory point of view to explore various aspects of smooth kernels related to their inferential properties. We analyze eigenvalue decay of kernels operators and matrices, properties of eigenfunctions/eigenvectors and “Fourier” coefficients of functions in the kernel space restricted to a discrete set of data points. We also investigate the fitting capacity of kernels, giving explicit bounds on the fat shattering dimension of the balls in Reproducing Kernel Hilbert spaces. Interestingly, the same properties that make kernels very effective approximators for functions in their “native” kernel space, also limit their capacity to represent arbitrary functions. We discuss various implications, including those for gradient descent type methods. It is important to note that most of our bounds are measure independent. Moreover, at least in moderate dimension, the bounds for eigenvalues are much tighter than the bounds which can be obtained from the usual matrix concentration results. For example, we see that eigenvalues of kernel matrices show nearly exponential decay with constants depending only on the kernel and the domain. We call this “approximation beats concentration” phenomenon as even when the data are sampled from a probability distribution, some of their aspects are better understood in terms of approximation theory.

Cite this Paper

BibTeX


@InProceedings{pmlr-v75-belkin18a,
  title = 	 {Approximation beats concentration? An approximation view  on inference with smooth radial kernels},
  author =       {Belkin, Mikhail},
  booktitle = 	 {Proceedings of the 31st  Conference On Learning Theory},
  pages = 	 {1348--1361},
  year = 	 {2018},
  editor = 	 {Bubeck, Sébastien and Perchet, Vianney and Rigollet, Philippe},
  volume = 	 {75},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v75/belkin18a/belkin18a.pdf},
  url = 	 {https://proceedings.mlr.press/v75/belkin18a.html},
  abstract = 	 {Positive definite kernels and their associated Reproducing Kernel Hilbert Spaces provide a mathematically compelling and practically competitive framework for learning from data.  In this paper we take the approximation theory point of view to explore various aspects of  smooth kernels related to their inferential properties. We  analyze eigenvalue decay of  kernels operators and  matrices,  properties of eigenfunctions/eigenvectors and “Fourier” coefficients of functions in the kernel space restricted to a discrete set of data points. We also investigate the fitting capacity of kernels,  giving explicit bounds on the fat shattering dimension of the balls in  Reproducing Kernel Hilbert spaces.  Interestingly, the same properties that make kernels  very effective approximators for functions in their “native” kernel space,  also limit their capacity to represent arbitrary functions.  We discuss various implications, including those for gradient descent type methods. It is important to note that most of our  bounds are measure independent.  Moreover,  at least in moderate dimension, the bounds for eigenvalues are much tighter than the bounds which can be obtained from the usual matrix concentration results. For example, we see that  eigenvalues of kernel matrices show nearly exponential decay with constants depending only on the kernel and the domain. We call this “approximation beats concentration” phenomenon as even when the data are sampled from a probability distribution, some of their aspects are better understood in terms of approximation theory.  }
}

Endnote

%0 Conference Paper
%T Approximation beats concentration? An approximation view  on inference with smooth radial kernels
%A Mikhail Belkin
%B Proceedings of the 31st  Conference On Learning Theory
%C Proceedings of Machine Learning Research
%D 2018
%E Sébastien Bubeck
%E Vianney Perchet
%E Philippe Rigollet	
%F pmlr-v75-belkin18a
%I PMLR
%P 1348--1361
%U https://proceedings.mlr.press/v75/belkin18a.html
%V 75
%X Positive definite kernels and their associated Reproducing Kernel Hilbert Spaces provide a mathematically compelling and practically competitive framework for learning from data.  In this paper we take the approximation theory point of view to explore various aspects of  smooth kernels related to their inferential properties. We  analyze eigenvalue decay of  kernels operators and  matrices,  properties of eigenfunctions/eigenvectors and “Fourier” coefficients of functions in the kernel space restricted to a discrete set of data points. We also investigate the fitting capacity of kernels,  giving explicit bounds on the fat shattering dimension of the balls in  Reproducing Kernel Hilbert spaces.  Interestingly, the same properties that make kernels  very effective approximators for functions in their “native” kernel space,  also limit their capacity to represent arbitrary functions.  We discuss various implications, including those for gradient descent type methods. It is important to note that most of our  bounds are measure independent.  Moreover,  at least in moderate dimension, the bounds for eigenvalues are much tighter than the bounds which can be obtained from the usual matrix concentration results. For example, we see that  eigenvalues of kernel matrices show nearly exponential decay with constants depending only on the kernel and the domain. We call this “approximation beats concentration” phenomenon as even when the data are sampled from a probability distribution, some of their aspects are better understood in terms of approximation theory.

APA


Belkin, M.. (2018). Approximation beats concentration? An approximation view  on inference with smooth radial kernels. Proceedings of the 31st  Conference On Learning Theory, in Proceedings of Machine Learning Research 75:1348-1361 Available from https://proceedings.mlr.press/v75/belkin18a.html.

Related Material

Download PDF