Sharp Analysis for Nonconvex SGD Escaping from Saddle Points
[edit]
Proceedings of the ThirtySecond Conference on Learning Theory, PMLR 99:11921234, 2019.
Abstract
In this paper, we give a sharp analysis for Stochastic Gradient Descent (SGD) and prove that SGD is able to efficiently escape from saddle points and find an $(\epsilon, O(\epsilon^{0.5}))$approximate secondorder stationary point in $\tilde{O}(\epsilon^{3.5})$ stochastic gradient computations for generic nonconvex optimization problems, when the objective function satisfies gradientLipschitz, HessianLipschitz, and dispersive noise assumptions. This result subverts the classical belief that SGD requires at least $O(\epsilon^{4})$ stochastic gradient computations for obtaining an $(\epsilon,O(\epsilon^{0.5}))$approximate secondorder stationary point. Such SGD rate matches, up to a polylogarithmic factor of problemdependent parameters, the rate of most accelerated nonconvex stochastic optimization algorithms that adopt additional techniques, such as Nesterov’s momentum acceleration, negative curvature search, as well as quadratic and cubic regularization tricks. Our novel analysis gives new insights into nonconvex SGD and can be potentially generalized to a broad class of stochastic optimization algorithms.
Related Material


