“Convex Until Proven Guilty”: Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions

Yair Carmon; John C. Duchi; Oliver Hinder; Aaron Sidford

“Convex Until Proven Guilty”: Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions

Yair Carmon, John C. Duchi, Oliver Hinder, Aaron Sidford

Proceedings of the 34th International Conference on Machine Learning, PMLR 70:654-663, 2017.

Abstract

We develop and analyze a variant of Nesterov’s accelerated gradient descent (AGD) for minimization of smooth non-convex functions. We prove that one of two cases occurs: either our AGD variant converges quickly, as if the function was convex, or we produce a certificate that the function is “guilty” of being non-convex. This non-convexity certificate allows us to exploit negative curvature and obtain deterministic, dimension-free acceleration of convergence for non-convex functions. For a function

$f$ with Lipschitz continuous gradient and Hessian, we compute a point

$x$ with

$\|\nabla f(x)\| \le \epsilon$ in

$O(\epsilon^{-7/4} \log(1/ \epsilon) )$ gradient and function evaluations. Assuming additionally that the third derivative is Lipschitz, we require only

$O(\epsilon^{-5/3} \log(1/ \epsilon) )$ evaluations.

Cite this Paper

BibTeX


@InProceedings{pmlr-v70-carmon17a,
  title = 	 {``{C}onvex Until Proven Guilty'': Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions},
  author =       {Yair Carmon and John C. Duchi and Oliver Hinder and Aaron Sidford},
  booktitle = 	 {Proceedings of the 34th International Conference on Machine Learning},
  pages = 	 {654--663},
  year = 	 {2017},
  editor = 	 {Precup, Doina and Teh, Yee Whye},
  volume = 	 {70},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--11 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v70/carmon17a/carmon17a.pdf},
  url = 	 {https://proceedings.mlr.press/v70/carmon17a.html},
  abstract = 	 {We develop and analyze a variant of Nesterov’s accelerated gradient descent (AGD) for minimization of smooth non-convex functions. We prove that one of two cases occurs: either our AGD variant converges quickly, as if the function was convex, or we produce a certificate that the function is “guilty” of being non-convex. This non-convexity certificate allows us to exploit negative curvature and obtain deterministic, dimension-free acceleration of convergence for non-convex functions. For a function $f$ with Lipschitz continuous gradient and Hessian, we compute a point $x$ with $\|\nabla f(x)\| \le \epsilon$ in $O(\epsilon^{-7/4} \log(1/ \epsilon) )$ gradient and function evaluations. Assuming additionally that the third derivative is Lipschitz, we require only $O(\epsilon^{-5/3} \log(1/ \epsilon) )$ evaluations.}
}

Endnote

%0 Conference Paper
%T “Convex Until Proven Guilty”: Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions
%A Yair Carmon
%A John C. Duchi
%A Oliver Hinder
%A Aaron Sidford
%B Proceedings of the 34th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2017
%E Doina Precup
%E Yee Whye Teh	
%F pmlr-v70-carmon17a
%I PMLR
%P 654--663
%U https://proceedings.mlr.press/v70/carmon17a.html
%V 70
%X We develop and analyze a variant of Nesterov’s accelerated gradient descent (AGD) for minimization of smooth non-convex functions. We prove that one of two cases occurs: either our AGD variant converges quickly, as if the function was convex, or we produce a certificate that the function is “guilty” of being non-convex. This non-convexity certificate allows us to exploit negative curvature and obtain deterministic, dimension-free acceleration of convergence for non-convex functions. For a function $f$ with Lipschitz continuous gradient and Hessian, we compute a point $x$ with $\|\nabla f(x)\| \le \epsilon$ in $O(\epsilon^{-7/4} \log(1/ \epsilon) )$ gradient and function evaluations. Assuming additionally that the third derivative is Lipschitz, we require only $O(\epsilon^{-5/3} \log(1/ \epsilon) )$ evaluations.

APA


Carmon, Y., Duchi, J.C., Hinder, O. & Sidford, A.. (2017). “Convex Until Proven Guilty”: Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:654-663 Available from https://proceedings.mlr.press/v70/carmon17a.html.

“Convex Until Proven Guilty”: Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions

Abstract

Cite this Paper

Related Material