Non-Vacuous Generalisation Bounds for Shallow Neural Networks

Felix Biggs, Benjamin Guedj
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:1963-1981, 2022.

Abstract

We focus on a specific class of shallow neural networks with a single hidden layer, namely those with $L_2$-normalised data and either a sigmoid-shaped Gaussian error function (“erf”) activation or a Gaussian Error Linear Unit (GELU) activation. For these networks, we derive new generalisation bounds through the PAC-Bayesian theory; unlike most existing such bounds they apply to neural networks with deterministic rather than randomised parameters. Our bounds are empirically non-vacuous when the network is trained with vanilla stochastic gradient descent on MNIST and Fashion-MNIST.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-biggs22a, title = {Non-Vacuous Generalisation Bounds for Shallow Neural Networks}, author = {Biggs, Felix and Guedj, Benjamin}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {1963--1981}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/biggs22a/biggs22a.pdf}, url = {https://proceedings.mlr.press/v162/biggs22a.html}, abstract = {We focus on a specific class of shallow neural networks with a single hidden layer, namely those with $L_2$-normalised data and either a sigmoid-shaped Gaussian error function (“erf”) activation or a Gaussian Error Linear Unit (GELU) activation. For these networks, we derive new generalisation bounds through the PAC-Bayesian theory; unlike most existing such bounds they apply to neural networks with deterministic rather than randomised parameters. Our bounds are empirically non-vacuous when the network is trained with vanilla stochastic gradient descent on MNIST and Fashion-MNIST.} }
Endnote
%0 Conference Paper %T Non-Vacuous Generalisation Bounds for Shallow Neural Networks %A Felix Biggs %A Benjamin Guedj %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-biggs22a %I PMLR %P 1963--1981 %U https://proceedings.mlr.press/v162/biggs22a.html %V 162 %X We focus on a specific class of shallow neural networks with a single hidden layer, namely those with $L_2$-normalised data and either a sigmoid-shaped Gaussian error function (“erf”) activation or a Gaussian Error Linear Unit (GELU) activation. For these networks, we derive new generalisation bounds through the PAC-Bayesian theory; unlike most existing such bounds they apply to neural networks with deterministic rather than randomised parameters. Our bounds are empirically non-vacuous when the network is trained with vanilla stochastic gradient descent on MNIST and Fashion-MNIST.
APA
Biggs, F. & Guedj, B.. (2022). Non-Vacuous Generalisation Bounds for Shallow Neural Networks. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:1963-1981 Available from https://proceedings.mlr.press/v162/biggs22a.html.

Related Material