Critical feature learning in deep neural networks

Kirsten Fischer; Javed Lindner; David Dahmen; Zohar Ringel; Michael Krämer; Moritz Helias

Critical feature learning in deep neural networks

Kirsten Fischer, Javed Lindner, David Dahmen, Zohar Ringel, Michael Krämer, Moritz Helias

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:13660-13690, 2024.

Abstract

A key property of neural networks driving their success is their ability to learn features from data. Understanding feature learning from a theoretical viewpoint is an emerging field with many open questions. In this work we capture finite-width effects with a systematic theory of network kernels in deep non-linear neural networks. We show that the Bayesian prior of the network can be written in closed form as a superposition of Gaussian processes, whose kernels are distributed with a variance that depends inversely on the network width $N$. A large deviation approach, which is exact in the proportional limit for the number of data points $P=\alpha N\to\infty$, yields a pair of forward-backward equations for the maximum a posteriori kernels in all layers at once. We study their solutions perturbatively, to demonstrate how the backward propagation across layers aligns kernels with the target. An alternative field-theoretic formulation shows that kernel adaptation of the Bayesian posterior at finite-width results from fluctuations in the prior: larger fluctuations correspond to a more flexible network prior and thus enable stronger adaptation to data. We thus find a bridge between the classical edge-of-chaos NNGP theory and feature learning, exposing an intricate interplay between criticality, response functions, and feature scale.

Cite this Paper

BibTeX

@InProceedings{pmlr-v235-fischer24a,
  title = 	 {Critical feature learning in deep neural networks},
  author =       {Fischer, Kirsten and Lindner, Javed and Dahmen, David and Ringel, Zohar and Kr\"{a}mer, Michael and Helias, Moritz},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {13660--13690},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/fischer24a/fischer24a.pdf},
  url = 	 {https://proceedings.mlr.press/v235/fischer24a.html},
  abstract = 	 {A key property of neural networks driving their success is their ability to learn features from data. Understanding feature learning from a theoretical viewpoint is an emerging field with many open questions. In this work we capture finite-width effects with a systematic theory of network kernels in deep non-linear neural networks. We show that the Bayesian prior of the network can be written in closed form as a superposition of Gaussian processes, whose kernels are distributed with a variance that depends inversely on the network width $N$. A large deviation approach, which is exact in the proportional limit for the number of data points $P=\alpha N\to\infty$, yields a pair of forward-backward equations for the maximum a posteriori kernels in all layers at once. We study their solutions perturbatively, to demonstrate how the backward propagation across layers aligns kernels with the target. An alternative field-theoretic formulation shows that kernel adaptation of the Bayesian posterior at finite-width results from fluctuations in the prior: larger fluctuations correspond to a more flexible network prior and thus enable stronger adaptation to data. We thus find a bridge between the classical edge-of-chaos NNGP theory and feature learning, exposing an intricate interplay between criticality, response functions, and feature scale.}
}

Endnote

%0 Conference Paper
%T Critical feature learning in deep neural networks
%A Kirsten Fischer
%A Javed Lindner
%A David Dahmen
%A Zohar Ringel
%A Michael Krämer
%A Moritz Helias
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-fischer24a
%I PMLR
%P 13660--13690
%U https://proceedings.mlr.press/v235/fischer24a.html
%V 235
%X A key property of neural networks driving their success is their ability to learn features from data. Understanding feature learning from a theoretical viewpoint is an emerging field with many open questions. In this work we capture finite-width effects with a systematic theory of network kernels in deep non-linear neural networks. We show that the Bayesian prior of the network can be written in closed form as a superposition of Gaussian processes, whose kernels are distributed with a variance that depends inversely on the network width $N$. A large deviation approach, which is exact in the proportional limit for the number of data points $P=\alpha N\to\infty$, yields a pair of forward-backward equations for the maximum a posteriori kernels in all layers at once. We study their solutions perturbatively, to demonstrate how the backward propagation across layers aligns kernels with the target. An alternative field-theoretic formulation shows that kernel adaptation of the Bayesian posterior at finite-width results from fluctuations in the prior: larger fluctuations correspond to a more flexible network prior and thus enable stronger adaptation to data. We thus find a bridge between the classical edge-of-chaos NNGP theory and feature learning, exposing an intricate interplay between criticality, response functions, and feature scale.

APA

Fischer, K., Lindner, J., Dahmen, D., Ringel, Z., Krämer, M. & Helias, M.. (2024). Critical feature learning in deep neural networks. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:13660-13690 Available from https://proceedings.mlr.press/v235/fischer24a.html.

Critical feature learning in deep neural networks

Abstract

Cite this Paper

Related Material