Asymptotic Errors for High-Dimensional Convex Penalized Linear Regression beyond Gaussian Matrices

Cédric Gerbelot, Alia Abbara, Florent Krzakala
Proceedings of Thirty Third Conference on Learning Theory, PMLR 125:1682-1713, 2020.

Abstract

We consider the problem of learning a coefficient vector $\bf x_0 \in \mathbb R^N$ from noisy linear observations $\mathbf{y} = \mathbf{F}{\mathbf{x}_{0}}+\mathbf{w} \in \mathbb R^M$ in high dimensional limit $M,N \to \infty$ with $\alpha \equiv M/N$ fixed. We provide a rigorous derivation of an explicit formula —first conjectured using heuristics method from statistical physics— for the asymptotic mean squared error obtained by penalized convex estimators such as the LASSO or the elastic net, for a sequence of very generic random matrix $\mathbf{F}$ corresponding to rotationally invariant data matrices of arbitrary spectrum. The proof is based on a convergence analysis of an oracle version of vector approximate message-passing (oracle-VAMP) and on the properties of its state evolution equations. Our method leverages on and highlights the link between vector approximate message-passing, Douglas-Rachford splitting and proximal descent algorithms, extending previous results obtained with i.i.d. matrices for a large class of problems. We illustrate our results on some concrete examples and show that even though they are asymptotic, our predictions agree remarkably well with numerics even for very moderate sizes.

Cite this Paper


BibTeX
@InProceedings{pmlr-v125-gerbelot20a, title = {Asymptotic Errors for High-Dimensional Convex Penalized Linear Regression beyond Gaussian Matrices}, author = {Gerbelot, C\'{e}dric and Abbara, Alia and Krzakala, Florent}, booktitle = {Proceedings of Thirty Third Conference on Learning Theory}, pages = {1682--1713}, year = {2020}, editor = {Abernethy, Jacob and Agarwal, Shivani}, volume = {125}, series = {Proceedings of Machine Learning Research}, month = {09--12 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v125/gerbelot20a/gerbelot20a.pdf}, url = {https://proceedings.mlr.press/v125/gerbelot20a.html}, abstract = { We consider the problem of learning a coefficient vector $\bf x_0 \in \mathbb R^N$ from noisy linear observations $\mathbf{y} = \mathbf{F}{\mathbf{x}_{0}}+\mathbf{w} \in \mathbb R^M$ in high dimensional limit $M,N \to \infty$ with $\alpha \equiv M/N$ fixed. We provide a rigorous derivation of an explicit formula —first conjectured using heuristics method from statistical physics— for the asymptotic mean squared error obtained by penalized convex estimators such as the LASSO or the elastic net, for a sequence of very generic random matrix $\mathbf{F}$ corresponding to rotationally invariant data matrices of arbitrary spectrum. The proof is based on a convergence analysis of an oracle version of vector approximate message-passing (oracle-VAMP) and on the properties of its state evolution equations. Our method leverages on and highlights the link between vector approximate message-passing, Douglas-Rachford splitting and proximal descent algorithms, extending previous results obtained with i.i.d. matrices for a large class of problems. We illustrate our results on some concrete examples and show that even though they are asymptotic, our predictions agree remarkably well with numerics even for very moderate sizes.} }
Endnote
%0 Conference Paper %T Asymptotic Errors for High-Dimensional Convex Penalized Linear Regression beyond Gaussian Matrices %A Cédric Gerbelot %A Alia Abbara %A Florent Krzakala %B Proceedings of Thirty Third Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2020 %E Jacob Abernethy %E Shivani Agarwal %F pmlr-v125-gerbelot20a %I PMLR %P 1682--1713 %U https://proceedings.mlr.press/v125/gerbelot20a.html %V 125 %X We consider the problem of learning a coefficient vector $\bf x_0 \in \mathbb R^N$ from noisy linear observations $\mathbf{y} = \mathbf{F}{\mathbf{x}_{0}}+\mathbf{w} \in \mathbb R^M$ in high dimensional limit $M,N \to \infty$ with $\alpha \equiv M/N$ fixed. We provide a rigorous derivation of an explicit formula —first conjectured using heuristics method from statistical physics— for the asymptotic mean squared error obtained by penalized convex estimators such as the LASSO or the elastic net, for a sequence of very generic random matrix $\mathbf{F}$ corresponding to rotationally invariant data matrices of arbitrary spectrum. The proof is based on a convergence analysis of an oracle version of vector approximate message-passing (oracle-VAMP) and on the properties of its state evolution equations. Our method leverages on and highlights the link between vector approximate message-passing, Douglas-Rachford splitting and proximal descent algorithms, extending previous results obtained with i.i.d. matrices for a large class of problems. We illustrate our results on some concrete examples and show that even though they are asymptotic, our predictions agree remarkably well with numerics even for very moderate sizes.
APA
Gerbelot, C., Abbara, A. & Krzakala, F.. (2020). Asymptotic Errors for High-Dimensional Convex Penalized Linear Regression beyond Gaussian Matrices. Proceedings of Thirty Third Conference on Learning Theory, in Proceedings of Machine Learning Research 125:1682-1713 Available from https://proceedings.mlr.press/v125/gerbelot20a.html.

Related Material