Asymptotics of Learning with Deep Structured (Random) Features

Dominik Schröder, Daniil Dmitriev, Hugo Cui, Bruno Loureiro
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:43862-43894, 2024.

Abstract

For a large class of feature maps we provide a tight asymptotic characterisation of the test error associated with learning the readout layer, in the high-dimensional limit where the input dimension, hidden layer widths, and number of training samples are proportionally large. This characterization is formulated in terms of the population covariance of the features. Our work is partially motivated by the problem of learning with Gaussian rainbow neural networks, namely deep non-linear fully-connected networks with random but structured weights, whose row-wise covariances are further allowed to depend on the weights of previous layers. For such networks we also derive a closed-form formula for the feature covariance in terms of the weight matrices. We further find that in some cases our results can capture feature maps learned by deep, finite-width neural networks trained under gradient descent.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-schroder24a, title = {Asymptotics of Learning with Deep Structured ({R}andom) Features}, author = {Schr\"{o}der, Dominik and Dmitriev, Daniil and Cui, Hugo and Loureiro, Bruno}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {43862--43894}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/schroder24a/schroder24a.pdf}, url = {https://proceedings.mlr.press/v235/schroder24a.html}, abstract = {For a large class of feature maps we provide a tight asymptotic characterisation of the test error associated with learning the readout layer, in the high-dimensional limit where the input dimension, hidden layer widths, and number of training samples are proportionally large. This characterization is formulated in terms of the population covariance of the features. Our work is partially motivated by the problem of learning with Gaussian rainbow neural networks, namely deep non-linear fully-connected networks with random but structured weights, whose row-wise covariances are further allowed to depend on the weights of previous layers. For such networks we also derive a closed-form formula for the feature covariance in terms of the weight matrices. We further find that in some cases our results can capture feature maps learned by deep, finite-width neural networks trained under gradient descent.} }
Endnote
%0 Conference Paper %T Asymptotics of Learning with Deep Structured (Random) Features %A Dominik Schröder %A Daniil Dmitriev %A Hugo Cui %A Bruno Loureiro %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-schroder24a %I PMLR %P 43862--43894 %U https://proceedings.mlr.press/v235/schroder24a.html %V 235 %X For a large class of feature maps we provide a tight asymptotic characterisation of the test error associated with learning the readout layer, in the high-dimensional limit where the input dimension, hidden layer widths, and number of training samples are proportionally large. This characterization is formulated in terms of the population covariance of the features. Our work is partially motivated by the problem of learning with Gaussian rainbow neural networks, namely deep non-linear fully-connected networks with random but structured weights, whose row-wise covariances are further allowed to depend on the weights of previous layers. For such networks we also derive a closed-form formula for the feature covariance in terms of the weight matrices. We further find that in some cases our results can capture feature maps learned by deep, finite-width neural networks trained under gradient descent.
APA
Schröder, D., Dmitriev, D., Cui, H. & Loureiro, B.. (2024). Asymptotics of Learning with Deep Structured (Random) Features. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:43862-43894 Available from https://proceedings.mlr.press/v235/schroder24a.html.

Related Material