The estimation error of general first order methods

Michael Celentano, Andrea Montanari, Yuchen Wu
Proceedings of Thirty Third Conference on Learning Theory, PMLR 125:1078-1141, 2020.

Abstract

Modern large-scale statistical models require the estimation of thousands to millions of parameters. This is often accomplished by iterative algorithms such as gradient descent, projected gradient descent or their accelerated versions. What are the fundamental limits of these approaches? This question is well understood from an optimization viewpoint when the underlying objective is convex. Work in this area characterizes the gap to global optimality as a function of the number of iterations. However, these results have only indirect implications on the gap to \emph{statistical} optimality. Here we consider two families of high-dimensional estimation problems: high-dimensional regression and low-rank matrix estimation, and introduce a class of ‘general first order methods’ that aim at efficiently estimating the underlying parameters. This class of algorithms is broad enough to include classical first order optimization (for convex and non-convex objectives), but also other types of algorithms. Under a random design assumption, we derive lower bounds on the estimation error that hold in the high-dimensional asymptotics in which both the number of observations and the number of parameters diverge. These lower bounds are optimal in the sense that there exist algorithms in this class whose estimation error matches the lower bounds up to asymptotically negligible terms. We illustrate our general results through applications to sparse phase retrieval and sparse principal component analysis.

Cite this Paper


BibTeX
@InProceedings{pmlr-v125-celentano20a, title = {The estimation error of general first order methods}, author = {Celentano, Michael and Montanari, Andrea and Wu, Yuchen}, booktitle = {Proceedings of Thirty Third Conference on Learning Theory}, pages = {1078--1141}, year = {2020}, editor = {Abernethy, Jacob and Agarwal, Shivani}, volume = {125}, series = {Proceedings of Machine Learning Research}, month = {09--12 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v125/celentano20a/celentano20a.pdf}, url = {https://proceedings.mlr.press/v125/celentano20a.html}, abstract = { Modern large-scale statistical models require the estimation of thousands to millions of parameters. This is often accomplished by iterative algorithms such as gradient descent, projected gradient descent or their accelerated versions. What are the fundamental limits of these approaches? This question is well understood from an optimization viewpoint when the underlying objective is convex. Work in this area characterizes the gap to global optimality as a function of the number of iterations. However, these results have only indirect implications on the gap to \emph{statistical} optimality. Here we consider two families of high-dimensional estimation problems: high-dimensional regression and low-rank matrix estimation, and introduce a class of ‘general first order methods’ that aim at efficiently estimating the underlying parameters. This class of algorithms is broad enough to include classical first order optimization (for convex and non-convex objectives), but also other types of algorithms. Under a random design assumption, we derive lower bounds on the estimation error that hold in the high-dimensional asymptotics in which both the number of observations and the number of parameters diverge. These lower bounds are optimal in the sense that there exist algorithms in this class whose estimation error matches the lower bounds up to asymptotically negligible terms. We illustrate our general results through applications to sparse phase retrieval and sparse principal component analysis.} }
Endnote
%0 Conference Paper %T The estimation error of general first order methods %A Michael Celentano %A Andrea Montanari %A Yuchen Wu %B Proceedings of Thirty Third Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2020 %E Jacob Abernethy %E Shivani Agarwal %F pmlr-v125-celentano20a %I PMLR %P 1078--1141 %U https://proceedings.mlr.press/v125/celentano20a.html %V 125 %X Modern large-scale statistical models require the estimation of thousands to millions of parameters. This is often accomplished by iterative algorithms such as gradient descent, projected gradient descent or their accelerated versions. What are the fundamental limits of these approaches? This question is well understood from an optimization viewpoint when the underlying objective is convex. Work in this area characterizes the gap to global optimality as a function of the number of iterations. However, these results have only indirect implications on the gap to \emph{statistical} optimality. Here we consider two families of high-dimensional estimation problems: high-dimensional regression and low-rank matrix estimation, and introduce a class of ‘general first order methods’ that aim at efficiently estimating the underlying parameters. This class of algorithms is broad enough to include classical first order optimization (for convex and non-convex objectives), but also other types of algorithms. Under a random design assumption, we derive lower bounds on the estimation error that hold in the high-dimensional asymptotics in which both the number of observations and the number of parameters diverge. These lower bounds are optimal in the sense that there exist algorithms in this class whose estimation error matches the lower bounds up to asymptotically negligible terms. We illustrate our general results through applications to sparse phase retrieval and sparse principal component analysis.
APA
Celentano, M., Montanari, A. & Wu, Y.. (2020). The estimation error of general first order methods. Proceedings of Thirty Third Conference on Learning Theory, in Proceedings of Machine Learning Research 125:1078-1141 Available from https://proceedings.mlr.press/v125/celentano20a.html.

Related Material