Deterministic equivalent and error universality of deep random features learning

Dominik Schröder, Hugo Cui, Daniil Dmitriev, Bruno Loureiro
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:30285-30320, 2023.

Abstract

This manuscript considers the problem of learning a random Gaussian network function using a fully connected network with frozen intermediate layers and trainable readout layer. This problem can be seen as a natural generalization of the widely studied random features model to deeper architectures. First, we prove Gaussian universality of the test error in a ridge regression setting where the learner and target networks share the same intermediate layers, and provide a sharp asymptotic formula for it. Establishing this result requires proving a deterministic equivalent for traces of the deep random features sample covariance matrices which can be of independent interest. Second, we conjecture the asymptotic Gaussian universality of the test error in the more general setting of arbitrary convex losses and generic learner/target architectures. We provide extensive numerical evidence for this conjecture, which requires the derivation of closed-form expressions for the layer-wise post-activation population covariances. In light of our results, we investigate the interplay between architecture design and implicit regularization.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-schroder23a, title = {Deterministic equivalent and error universality of deep random features learning}, author = {Schr\"{o}der, Dominik and Cui, Hugo and Dmitriev, Daniil and Loureiro, Bruno}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {30285--30320}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/schroder23a/schroder23a.pdf}, url = {https://proceedings.mlr.press/v202/schroder23a.html}, abstract = {This manuscript considers the problem of learning a random Gaussian network function using a fully connected network with frozen intermediate layers and trainable readout layer. This problem can be seen as a natural generalization of the widely studied random features model to deeper architectures. First, we prove Gaussian universality of the test error in a ridge regression setting where the learner and target networks share the same intermediate layers, and provide a sharp asymptotic formula for it. Establishing this result requires proving a deterministic equivalent for traces of the deep random features sample covariance matrices which can be of independent interest. Second, we conjecture the asymptotic Gaussian universality of the test error in the more general setting of arbitrary convex losses and generic learner/target architectures. We provide extensive numerical evidence for this conjecture, which requires the derivation of closed-form expressions for the layer-wise post-activation population covariances. In light of our results, we investigate the interplay between architecture design and implicit regularization.} }
Endnote
%0 Conference Paper %T Deterministic equivalent and error universality of deep random features learning %A Dominik Schröder %A Hugo Cui %A Daniil Dmitriev %A Bruno Loureiro %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-schroder23a %I PMLR %P 30285--30320 %U https://proceedings.mlr.press/v202/schroder23a.html %V 202 %X This manuscript considers the problem of learning a random Gaussian network function using a fully connected network with frozen intermediate layers and trainable readout layer. This problem can be seen as a natural generalization of the widely studied random features model to deeper architectures. First, we prove Gaussian universality of the test error in a ridge regression setting where the learner and target networks share the same intermediate layers, and provide a sharp asymptotic formula for it. Establishing this result requires proving a deterministic equivalent for traces of the deep random features sample covariance matrices which can be of independent interest. Second, we conjecture the asymptotic Gaussian universality of the test error in the more general setting of arbitrary convex losses and generic learner/target architectures. We provide extensive numerical evidence for this conjecture, which requires the derivation of closed-form expressions for the layer-wise post-activation population covariances. In light of our results, we investigate the interplay between architecture design and implicit regularization.
APA
Schröder, D., Cui, H., Dmitriev, D. & Loureiro, B.. (2023). Deterministic equivalent and error universality of deep random features learning. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:30285-30320 Available from https://proceedings.mlr.press/v202/schroder23a.html.

Related Material