ROOT-SGD: Sharp Nonasymptotics and Asymptotic Efficiency in a Single Algorithm

Chris Junchi Li; Wenlong Mou; Martin Wainwright; Michael Jordan

ROOT-SGD: Sharp Nonasymptotics and Asymptotic Efficiency in a Single Algorithm

Chris Junchi Li, Wenlong Mou, Martin Wainwright, Michael Jordan

Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:909-981, 2022.

Abstract

We study the problem of solving strongly convex and smooth unconstrained optimization problems using stochastic first-order algorithms. We devise a novel algorithm, referred to as \emph{Recursive One-Over-T SGD} (ROOT-SGD), based on an easily implementable, recursive averaging of past stochastic gradients. We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an asymptotic sense. On the nonasymptotic side, we prove risk bounds on the last iterate of ROOT-SGD with leading-order terms that match the optimal statistical risk with a unity pre-factor, along with a higher-order term that scales at the sharp rate of

$O(n^{-3/2})$ under the Lipschitz condition on the Hessian matrix. On the asymptotic side, we show that when a mild, one-point Hessian continuity condition is imposed, the rescaled last iterate of (multi-epoch) ROOT-SGD converges asymptotically to a Gaussian limit with the Cramér-Rao optimal asymptotic covariance, for a broad range of step-size choices.

Cite this Paper

BibTeX


@InProceedings{pmlr-v178-li22a,
  title = 	 {ROOT-SGD: Sharp Nonasymptotics and Asymptotic Efficiency in a Single Algorithm},
  author =       {Li, Chris Junchi and Mou, Wenlong and Wainwright, Martin and Jordan, Michael},
  booktitle = 	 {Proceedings of Thirty Fifth Conference on Learning Theory},
  pages = 	 {909--981},
  year = 	 {2022},
  editor = 	 {Loh, Po-Ling and Raginsky, Maxim},
  volume = 	 {178},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--05 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v178/li22a/li22a.pdf},
  url = 	 {https://proceedings.mlr.press/v178/li22a.html},
  abstract = 	 {We study the problem of solving strongly convex and smooth unconstrained optimization problems using stochastic first-order algorithms. We devise a novel algorithm, referred to as \emph{Recursive One-Over-T SGD} (ROOT-SGD), based on an easily implementable, recursive averaging of past stochastic gradients. We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an asymptotic sense. On the nonasymptotic side, we prove risk bounds on the last iterate of ROOT-SGD with leading-order terms that match the optimal statistical risk with a unity pre-factor, along with a higher-order term that scales at the sharp rate of $O(n^{-3/2})$ under the Lipschitz condition on the Hessian matrix. On the asymptotic side, we show that when a mild, one-point Hessian continuity condition is imposed, the rescaled last iterate of (multi-epoch) ROOT-SGD converges asymptotically to a Gaussian limit with the Cramér-Rao optimal asymptotic covariance, for a broad range of step-size choices.}
}

Endnote

%0 Conference Paper
%T ROOT-SGD: Sharp Nonasymptotics and Asymptotic Efficiency in a Single Algorithm
%A Chris Junchi Li
%A Wenlong Mou
%A Martin Wainwright
%A Michael Jordan
%B Proceedings of Thirty Fifth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2022
%E Po-Ling Loh
%E Maxim Raginsky	
%F pmlr-v178-li22a
%I PMLR
%P 909--981
%U https://proceedings.mlr.press/v178/li22a.html
%V 178
%X We study the problem of solving strongly convex and smooth unconstrained optimization problems using stochastic first-order algorithms. We devise a novel algorithm, referred to as \emph{Recursive One-Over-T SGD} (ROOT-SGD), based on an easily implementable, recursive averaging of past stochastic gradients. We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an asymptotic sense. On the nonasymptotic side, we prove risk bounds on the last iterate of ROOT-SGD with leading-order terms that match the optimal statistical risk with a unity pre-factor, along with a higher-order term that scales at the sharp rate of $O(n^{-3/2})$ under the Lipschitz condition on the Hessian matrix. On the asymptotic side, we show that when a mild, one-point Hessian continuity condition is imposed, the rescaled last iterate of (multi-epoch) ROOT-SGD converges asymptotically to a Gaussian limit with the Cramér-Rao optimal asymptotic covariance, for a broad range of step-size choices.

APA


Li, C.J., Mou, W., Wainwright, M. & Jordan, M.. (2022). ROOT-SGD: Sharp Nonasymptotics and Asymptotic Efficiency in a Single Algorithm. Proceedings of Thirty Fifth Conference on Learning Theory, in Proceedings of Machine Learning Research 178:909-981 Available from https://proceedings.mlr.press/v178/li22a.html.

Related Material

Download PDF