Variance Reduction for Faster NonConvex Optimization
[edit]
Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:699707, 2016.
Abstract
We consider the fundamental problem in nonconvex optimization of efficiently reaching a stationary point. In contrast to the convex case, in the long history of this basic problem, the only known theoretical results on firstorder nonconvex optimization remain to be full gradient descent that converges in O(1/\varepsilon) iterations for smooth objectives, and stochastic gradient descent that converges in O(1/\varepsilon^2) iterations for objectives that are sum of smooth functions. We provide the first improvement in this line of research. Our result is based on the variance reduction trick recently introduced to convex optimization, as well as a brand new analysis of variance reduction that is suitable for nonconvex optimization. For objectives that are sum of smooth functions, our firstorder minibatch stochastic method converges with an O(1/\varepsilon) rate, and is faster than full gradient descent by Ω(n^1/3). We demonstrate the effectiveness of our methods on empirical risk minimizations with nonconvex loss functions and training neural nets.
Related Material



