“Convex Until Proven Guilty”: DimensionFree Acceleration of Gradient Descent on NonConvex Functions
[edit]
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:654663, 2017.
Abstract
We develop and analyze a variant of Nesterov’s accelerated gradient descent (AGD) for minimization of smooth nonconvex functions. We prove that one of two cases occurs: either our AGD variant converges quickly, as if the function was convex, or we produce a certificate that the function is “guilty” of being nonconvex. This nonconvexity certificate allows us to exploit negative curvature and obtain deterministic, dimensionfree acceleration of convergence for nonconvex functions. For a function $f$ with Lipschitz continuous gradient and Hessian, we compute a point $x$ with $\\nabla f(x)\ \le \epsilon$ in $O(\epsilon^{7/4} \log(1/ \epsilon) )$ gradient and function evaluations. Assuming additionally that the third derivative is Lipschitz, we require only $O(\epsilon^{5/3} \log(1/ \epsilon) )$ evaluations.
Related Material


