Universality of empirical risk minimization

Andrea Montanari; Basil N. Saeed

Universality of empirical risk minimization

Andrea Montanari, Basil N. Saeed

Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:4310-4312, 2022.

Abstract

Consider supervised learning from i.i.d. samples {(y_i, x_i )}_{i≤n} where x_i ∈ R_p are feature vectors and y_i ∈ R are labels. We study empirical risk minimization over a class of functions that are parameterized by k = O(1) vectors θ_1 , . . . , θ_k ∈ R_p, and prove universality results both for the training and test error. Namely, under the proportional asymptotics n, p → ∞ , with n/p = Θ(1), we prove that the training error depends on the random features distribution only through its covariance structure. Further, we prove that the minimum test error over near-empirical risk minimizers enjoys similar universality properties. In particular, the asymptotics of these quantities can be computed —to leading order— under a simpler model in which the feature vectors x_i are replaced by Gaussian vectors g_i with the same covariance. Earlier universality results were limited to strongly convex learning procedures, or to feature vectors x_i with independent entries. Our results do not make any of these assumptions. Our assumptions are general enough to include feature vectors x_i that are produced by randomized featurization maps. In particular we explicitly check the assumptions for certain random features models (computing the output of a one-layer neural network with random weights) and neural tangent models (first-order Taylor approximation of two-layer networks).

Cite this Paper

BibTeX


@InProceedings{pmlr-v178-montanari22a,
  title = 	 {Universality of empirical risk minimization},
  author =       {Montanari, Andrea and Saeed, Basil N.},
  booktitle = 	 {Proceedings of Thirty Fifth Conference on Learning Theory},
  pages = 	 {4310--4312},
  year = 	 {2022},
  editor = 	 {Loh, Po-Ling and Raginsky, Maxim},
  volume = 	 {178},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--05 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v178/montanari22a/montanari22a.pdf},
  url = 	 {https://proceedings.mlr.press/v178/montanari22a.html},
  abstract = 	 {Consider supervised learning from i.i.d. samples {(y_i, x_i )}_{i≤n} where x_i ∈ R_p are feature vectors and y_i ∈ R are labels. We study empirical risk minimization over a class of functions that are parameterized by k = O(1) vectors θ_1 , . . . , θ_k ∈ R_p, and prove universality results both for the training and test error. Namely, under the proportional asymptotics n, p → ∞ , with n/p = Θ(1), we prove that the training error depends on the random features distribution only through its covariance structure. Further, we prove that the minimum test error over near-empirical risk minimizers enjoys similar universality properties. In particular, the asymptotics of these quantities can be computed —to leading order— under a simpler model in which the feature vectors x_i are replaced by Gaussian vectors g_i with the same covariance. Earlier universality results were limited to strongly convex learning procedures, or to feature vectors x_i with independent entries. Our results do not make any of these assumptions.  Our assumptions are general enough to include feature vectors x_i that are produced by randomized featurization maps. In particular we explicitly check the assumptions for certain random features models (computing the output of a one-layer neural network with random weights) and neural tangent models (first-order Taylor approximation of two-layer networks).}
}

Endnote

%0 Conference Paper
%T Universality of empirical risk minimization
%A Andrea Montanari
%A Basil N. Saeed
%B Proceedings of Thirty Fifth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2022
%E Po-Ling Loh
%E Maxim Raginsky	
%F pmlr-v178-montanari22a
%I PMLR
%P 4310--4312
%U https://proceedings.mlr.press/v178/montanari22a.html
%V 178
%X Consider supervised learning from i.i.d. samples {(y_i, x_i )}_{i≤n} where x_i ∈ R_p are feature vectors and y_i ∈ R are labels. We study empirical risk minimization over a class of functions that are parameterized by k = O(1) vectors θ_1 , . . . , θ_k ∈ R_p, and prove universality results both for the training and test error. Namely, under the proportional asymptotics n, p → ∞ , with n/p = Θ(1), we prove that the training error depends on the random features distribution only through its covariance structure. Further, we prove that the minimum test error over near-empirical risk minimizers enjoys similar universality properties. In particular, the asymptotics of these quantities can be computed —to leading order— under a simpler model in which the feature vectors x_i are replaced by Gaussian vectors g_i with the same covariance. Earlier universality results were limited to strongly convex learning procedures, or to feature vectors x_i with independent entries. Our results do not make any of these assumptions.  Our assumptions are general enough to include feature vectors x_i that are produced by randomized featurization maps. In particular we explicitly check the assumptions for certain random features models (computing the output of a one-layer neural network with random weights) and neural tangent models (first-order Taylor approximation of two-layer networks).

APA


Montanari, A. & Saeed, B.N.. (2022). Universality of empirical risk minimization. Proceedings of Thirty Fifth Conference on Learning Theory, in Proceedings of Machine Learning Research 178:4310-4312 Available from https://proceedings.mlr.press/v178/montanari22a.html.

Related Material

Download PDF