Universality of empirical risk minimization

Andrea Montanari, Basil N. Saeed
Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:4310-4312, 2022.

Abstract

Consider supervised learning from i.i.d. samples {(y_i, x_i )}_{i≤n} where x_i ∈ R_p are feature vectors and y_i ∈ R are labels. We study empirical risk minimization over a class of functions that are parameterized by k = O(1) vectors θ_1 , . . . , θ_k ∈ R_p, and prove universality results both for the training and test error. Namely, under the proportional asymptotics n, p → ∞ , with n/p = Θ(1), we prove that the training error depends on the random features distribution only through its covariance structure. Further, we prove that the minimum test error over near-empirical risk minimizers enjoys similar universality properties. In particular, the asymptotics of these quantities can be computed —to leading order— under a simpler model in which the feature vectors x_i are replaced by Gaussian vectors g_i with the same covariance. Earlier universality results were limited to strongly convex learning procedures, or to feature vectors x_i with independent entries. Our results do not make any of these assumptions. Our assumptions are general enough to include feature vectors x_i that are produced by randomized featurization maps. In particular we explicitly check the assumptions for certain random features models (computing the output of a one-layer neural network with random weights) and neural tangent models (first-order Taylor approximation of two-layer networks).

Cite this Paper


BibTeX
@InProceedings{pmlr-v178-montanari22a, title = {Universality of empirical risk minimization}, author = {Montanari, Andrea and Saeed, Basil N.}, booktitle = {Proceedings of Thirty Fifth Conference on Learning Theory}, pages = {4310--4312}, year = {2022}, editor = {Loh, Po-Ling and Raginsky, Maxim}, volume = {178}, series = {Proceedings of Machine Learning Research}, month = {02--05 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v178/montanari22a/montanari22a.pdf}, url = {https://proceedings.mlr.press/v178/montanari22a.html}, abstract = {Consider supervised learning from i.i.d. samples {(y_i, x_i )}_{i≤n} where x_i ∈ R_p are feature vectors and y_i ∈ R are labels. We study empirical risk minimization over a class of functions that are parameterized by k = O(1) vectors θ_1 , . . . , θ_k ∈ R_p, and prove universality results both for the training and test error. Namely, under the proportional asymptotics n, p → ∞ , with n/p = Θ(1), we prove that the training error depends on the random features distribution only through its covariance structure. Further, we prove that the minimum test error over near-empirical risk minimizers enjoys similar universality properties. In particular, the asymptotics of these quantities can be computed —to leading order— under a simpler model in which the feature vectors x_i are replaced by Gaussian vectors g_i with the same covariance. Earlier universality results were limited to strongly convex learning procedures, or to feature vectors x_i with independent entries. Our results do not make any of these assumptions. Our assumptions are general enough to include feature vectors x_i that are produced by randomized featurization maps. In particular we explicitly check the assumptions for certain random features models (computing the output of a one-layer neural network with random weights) and neural tangent models (first-order Taylor approximation of two-layer networks).} }
Endnote
%0 Conference Paper %T Universality of empirical risk minimization %A Andrea Montanari %A Basil N. Saeed %B Proceedings of Thirty Fifth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2022 %E Po-Ling Loh %E Maxim Raginsky %F pmlr-v178-montanari22a %I PMLR %P 4310--4312 %U https://proceedings.mlr.press/v178/montanari22a.html %V 178 %X Consider supervised learning from i.i.d. samples {(y_i, x_i )}_{i≤n} where x_i ∈ R_p are feature vectors and y_i ∈ R are labels. We study empirical risk minimization over a class of functions that are parameterized by k = O(1) vectors θ_1 , . . . , θ_k ∈ R_p, and prove universality results both for the training and test error. Namely, under the proportional asymptotics n, p → ∞ , with n/p = Θ(1), we prove that the training error depends on the random features distribution only through its covariance structure. Further, we prove that the minimum test error over near-empirical risk minimizers enjoys similar universality properties. In particular, the asymptotics of these quantities can be computed —to leading order— under a simpler model in which the feature vectors x_i are replaced by Gaussian vectors g_i with the same covariance. Earlier universality results were limited to strongly convex learning procedures, or to feature vectors x_i with independent entries. Our results do not make any of these assumptions. Our assumptions are general enough to include feature vectors x_i that are produced by randomized featurization maps. In particular we explicitly check the assumptions for certain random features models (computing the output of a one-layer neural network with random weights) and neural tangent models (first-order Taylor approximation of two-layer networks).
APA
Montanari, A. & Saeed, B.N.. (2022). Universality of empirical risk minimization. Proceedings of Thirty Fifth Conference on Learning Theory, in Proceedings of Machine Learning Research 178:4310-4312 Available from https://proceedings.mlr.press/v178/montanari22a.html.

Related Material