Rate of Convergence of Polynomial Networks to Gaussian Processes

Adam Klukowski
Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:701-722, 2022.

Abstract

We examine one-hidden-layer neural networks with random weights. It is well-known that in the limit of infinitely many neurons they simplify to Gaussian processes. For networks with a polynomial activation, we demonstrate that the rate of this convergence in 2-Wasserstein metric is O(1/sqrt(n)), where n is the number of hidden neurons. We suspect this rate is asymptotically sharp. We improve the known convergence rate for other activations, to power-law in n for ReLU and inverse-square-root up to logarithmic factors for erf. We explore the interplay between spherical harmonics, Stein kernels and optimal transport in the non-isotropic setting.

Cite this Paper


BibTeX
@InProceedings{pmlr-v178-klukowski22a, title = {Rate of Convergence of Polynomial Networks to Gaussian Processes}, author = {Klukowski, Adam}, booktitle = {Proceedings of Thirty Fifth Conference on Learning Theory}, pages = {701--722}, year = {2022}, editor = {Loh, Po-Ling and Raginsky, Maxim}, volume = {178}, series = {Proceedings of Machine Learning Research}, month = {02--05 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v178/klukowski22a/klukowski22a.pdf}, url = {https://proceedings.mlr.press/v178/klukowski22a.html}, abstract = {We examine one-hidden-layer neural networks with random weights. It is well-known that in the limit of infinitely many neurons they simplify to Gaussian processes. For networks with a polynomial activation, we demonstrate that the rate of this convergence in 2-Wasserstein metric is O(1/sqrt(n)), where n is the number of hidden neurons. We suspect this rate is asymptotically sharp. We improve the known convergence rate for other activations, to power-law in n for ReLU and inverse-square-root up to logarithmic factors for erf. We explore the interplay between spherical harmonics, Stein kernels and optimal transport in the non-isotropic setting.} }
Endnote
%0 Conference Paper %T Rate of Convergence of Polynomial Networks to Gaussian Processes %A Adam Klukowski %B Proceedings of Thirty Fifth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2022 %E Po-Ling Loh %E Maxim Raginsky %F pmlr-v178-klukowski22a %I PMLR %P 701--722 %U https://proceedings.mlr.press/v178/klukowski22a.html %V 178 %X We examine one-hidden-layer neural networks with random weights. It is well-known that in the limit of infinitely many neurons they simplify to Gaussian processes. For networks with a polynomial activation, we demonstrate that the rate of this convergence in 2-Wasserstein metric is O(1/sqrt(n)), where n is the number of hidden neurons. We suspect this rate is asymptotically sharp. We improve the known convergence rate for other activations, to power-law in n for ReLU and inverse-square-root up to logarithmic factors for erf. We explore the interplay between spherical harmonics, Stein kernels and optimal transport in the non-isotropic setting.
APA
Klukowski, A.. (2022). Rate of Convergence of Polynomial Networks to Gaussian Processes. Proceedings of Thirty Fifth Conference on Learning Theory, in Proceedings of Machine Learning Research 178:701-722 Available from https://proceedings.mlr.press/v178/klukowski22a.html.

Related Material