Non-Vacuous Generalisation Bounds for Shallow Neural Networks

Felix Biggs; Benjamin Guedj

Non-Vacuous Generalisation Bounds for Shallow Neural Networks

Felix Biggs, Benjamin Guedj

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:1963-1981, 2022.

Abstract

We focus on a specific class of shallow neural networks with a single hidden layer, namely those with $L_2$-normalised data and either a sigmoid-shaped Gaussian error function (“erf”) activation or a Gaussian Error Linear Unit (GELU) activation. For these networks, we derive new generalisation bounds through the PAC-Bayesian theory; unlike most existing such bounds they apply to neural networks with deterministic rather than randomised parameters. Our bounds are empirically non-vacuous when the network is trained with vanilla stochastic gradient descent on MNIST and Fashion-MNIST.

Cite this Paper

BibTeX

@InProceedings{pmlr-v162-biggs22a,
  title = 	 {Non-Vacuous Generalisation Bounds for Shallow Neural Networks},
  author =       {Biggs, Felix and Guedj, Benjamin},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {1963--1981},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/biggs22a/biggs22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/biggs22a.html},
  abstract = 	 {We focus on a specific class of shallow neural networks with a single hidden layer, namely those with $L_2$-normalised data and either a sigmoid-shaped Gaussian error function (“erf”) activation or a Gaussian Error Linear Unit (GELU) activation. For these networks, we derive new generalisation bounds through the PAC-Bayesian theory; unlike most existing such bounds they apply to neural networks with deterministic rather than randomised parameters. Our bounds are empirically non-vacuous when the network is trained with vanilla stochastic gradient descent on MNIST and Fashion-MNIST.}
}

Endnote

%0 Conference Paper
%T Non-Vacuous Generalisation Bounds for Shallow Neural Networks
%A Felix Biggs
%A Benjamin Guedj
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-biggs22a
%I PMLR
%P 1963--1981
%U https://proceedings.mlr.press/v162/biggs22a.html
%V 162
%X We focus on a specific class of shallow neural networks with a single hidden layer, namely those with $L_2$-normalised data and either a sigmoid-shaped Gaussian error function (“erf”) activation or a Gaussian Error Linear Unit (GELU) activation. For these networks, we derive new generalisation bounds through the PAC-Bayesian theory; unlike most existing such bounds they apply to neural networks with deterministic rather than randomised parameters. Our bounds are empirically non-vacuous when the network is trained with vanilla stochastic gradient descent on MNIST and Fashion-MNIST.

APA

Biggs, F. & Guedj, B.. (2022). Non-Vacuous Generalisation Bounds for Shallow Neural Networks. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:1963-1981 Available from https://proceedings.mlr.press/v162/biggs22a.html.

Non-Vacuous Generalisation Bounds for Shallow Neural Networks

Abstract

Cite this Paper

Related Material