Rate of Convergence of Polynomial Networks to Gaussian Processes

Adam Klukowski

Rate of Convergence of Polynomial Networks to Gaussian Processes

Adam Klukowski

Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:701-722, 2022.

Abstract

We examine one-hidden-layer neural networks with random weights. It is well-known that in the limit of infinitely many neurons they simplify to Gaussian processes. For networks with a polynomial activation, we demonstrate that the rate of this convergence in 2-Wasserstein metric is O(1/sqrt(n)), where n is the number of hidden neurons. We suspect this rate is asymptotically sharp. We improve the known convergence rate for other activations, to power-law in n for ReLU and inverse-square-root up to logarithmic factors for erf. We explore the interplay between spherical harmonics, Stein kernels and optimal transport in the non-isotropic setting.

Cite this Paper

BibTeX


@InProceedings{pmlr-v178-klukowski22a,
  title = 	 {Rate of Convergence of Polynomial Networks to Gaussian Processes},
  author =       {Klukowski, Adam},
  booktitle = 	 {Proceedings of Thirty Fifth Conference on Learning Theory},
  pages = 	 {701--722},
  year = 	 {2022},
  editor = 	 {Loh, Po-Ling and Raginsky, Maxim},
  volume = 	 {178},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--05 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v178/klukowski22a/klukowski22a.pdf},
  url = 	 {https://proceedings.mlr.press/v178/klukowski22a.html},
  abstract = 	 {We examine one-hidden-layer neural networks with random weights. It is well-known that in the limit of infinitely many neurons they simplify to Gaussian processes. For networks with a polynomial activation, we demonstrate that the rate of this convergence in 2-Wasserstein metric is O(1/sqrt(n)), where n is the number of hidden neurons. We suspect this rate is asymptotically sharp. We improve the known convergence rate for other activations, to power-law in n for ReLU and inverse-square-root up to logarithmic factors for erf. We explore the interplay between spherical harmonics, Stein kernels and optimal transport in the non-isotropic setting.}
}

Endnote

%0 Conference Paper
%T Rate of Convergence of Polynomial Networks to Gaussian Processes
%A Adam Klukowski
%B Proceedings of Thirty Fifth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2022
%E Po-Ling Loh
%E Maxim Raginsky	
%F pmlr-v178-klukowski22a
%I PMLR
%P 701--722
%U https://proceedings.mlr.press/v178/klukowski22a.html
%V 178
%X We examine one-hidden-layer neural networks with random weights. It is well-known that in the limit of infinitely many neurons they simplify to Gaussian processes. For networks with a polynomial activation, we demonstrate that the rate of this convergence in 2-Wasserstein metric is O(1/sqrt(n)), where n is the number of hidden neurons. We suspect this rate is asymptotically sharp. We improve the known convergence rate for other activations, to power-law in n for ReLU and inverse-square-root up to logarithmic factors for erf. We explore the interplay between spherical harmonics, Stein kernels and optimal transport in the non-isotropic setting.

APA


Klukowski, A.. (2022). Rate of Convergence of Polynomial Networks to Gaussian Processes. Proceedings of Thirty Fifth Conference on Learning Theory, in Proceedings of Machine Learning Research 178:701-722 Available from https://proceedings.mlr.press/v178/klukowski22a.html.

Related Material

Download PDF