Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the $O(ε^-7/4)$ Complexity

Huan Li; Zhouchen Lin

Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the $O(ε^-7/4)$ Complexity

Huan Li, Zhouchen Lin

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:12901-12916, 2022.

Abstract

This paper studies the accelerated gradient descent for general nonconvex problems under the gradient Lipschitz and Hessian Lipschitz assumptions. We establish that a simple restarted accelerated gradient descent (AGD) finds an

$\epsilon$ -approximate first-order stationary point in

$O(\epsilon^{-7/4})$ gradient computations with simple proofs. Our complexity does not hide any polylogarithmic factors, and thus it improves over the state-of-the-art one by the

$O(\log\frac{1}{\epsilon})$ factor. Our simple algorithm only consists of Nesterov’s classical AGD and a restart mechanism, and it does not need the negative curvature exploitation or the optimization of regularized surrogate functions. Technically, our simple proof does not invoke the analysis for the strongly convex AGD, which is crucial to remove the

$O(\log\frac{1}{\epsilon})$ factor.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-li22o,
  title = 	 {Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the $O(ε^{-7/4})$ Complexity},
  author =       {Li, Huan and Lin, Zhouchen},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {12901--12916},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/li22o/li22o.pdf},
  url = 	 {https://proceedings.mlr.press/v162/li22o.html},
  abstract = 	 {This paper studies the accelerated gradient descent for general nonconvex problems under the gradient Lipschitz and Hessian Lipschitz assumptions. We establish that a simple restarted accelerated gradient descent (AGD) finds an $\epsilon$-approximate first-order stationary point in $O(\epsilon^{-7/4})$ gradient computations with simple proofs. Our complexity does not hide any polylogarithmic factors, and thus it improves over the state-of-the-art one by the $O(\log\frac{1}{\epsilon})$ factor. Our simple algorithm only consists of Nesterov’s classical AGD and a restart mechanism, and it does not need the negative curvature exploitation or the optimization of regularized surrogate functions. Technically, our simple proof does not invoke the analysis for the strongly convex AGD, which is crucial to remove the $O(\log\frac{1}{\epsilon})$ factor.}
}

Endnote

%0 Conference Paper
%T Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the $O(ε^-7/4)$ Complexity
%A Huan Li
%A Zhouchen Lin
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-li22o
%I PMLR
%P 12901--12916
%U https://proceedings.mlr.press/v162/li22o.html
%V 162
%X This paper studies the accelerated gradient descent for general nonconvex problems under the gradient Lipschitz and Hessian Lipschitz assumptions. We establish that a simple restarted accelerated gradient descent (AGD) finds an $\epsilon$-approximate first-order stationary point in $O(\epsilon^{-7/4})$ gradient computations with simple proofs. Our complexity does not hide any polylogarithmic factors, and thus it improves over the state-of-the-art one by the $O(\log\frac{1}{\epsilon})$ factor. Our simple algorithm only consists of Nesterov’s classical AGD and a restart mechanism, and it does not need the negative curvature exploitation or the optimization of regularized surrogate functions. Technically, our simple proof does not invoke the analysis for the strongly convex AGD, which is crucial to remove the $O(\log\frac{1}{\epsilon})$ factor.

APA


Li, H. & Lin, Z.. (2022). Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the $O(ε^-7/4)$ Complexity. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:12901-12916 Available from https://proceedings.mlr.press/v162/li22o.html.

Related Material

Download PDF

Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the O(ε^-7/4)O(ε^-7/4) Complexity

Abstract

Cite this Paper

Related Material

Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the $O(ε^-7/4)$ Complexity