Variance Reduction for Faster Non-Convex Optimization

Zeyuan Allen-Zhu; Elad Hazan

Variance Reduction for Faster Non-Convex Optimization

Zeyuan Allen-Zhu, Elad Hazan

Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:699-707, 2016.

Abstract

We consider the fundamental problem in non-convex optimization of efficiently reaching a stationary point. In contrast to the convex case, in the long history of this basic problem, the only known theoretical results on first-order non-convex optimization remain to be full gradient descent that converges in O(1/\varepsilon) iterations for smooth objectives, and stochastic gradient descent that converges in O(1/\varepsilon^2) iterations for objectives that are sum of smooth functions. We provide the first improvement in this line of research. Our result is based on the variance reduction trick recently introduced to convex optimization, as well as a brand new analysis of variance reduction that is suitable for non-convex optimization. For objectives that are sum of smooth functions, our first-order minibatch stochastic method converges with an O(1/\varepsilon) rate, and is faster than full gradient descent by Ω(n^1/3). We demonstrate the effectiveness of our methods on empirical risk minimizations with non-convex loss functions and training neural nets.

Cite this Paper

BibTeX


@InProceedings{pmlr-v48-allen-zhua16,
  title = 	 {Variance Reduction for Faster Non-Convex Optimization},
  author = 	 {Allen-Zhu, Zeyuan and Hazan, Elad},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {699--707},
  year = 	 {2016},
  editor = 	 {Balcan, Maria Florina and Weinberger, Kilian Q.},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {20--22 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v48/allen-zhua16.pdf},
  url = 	 {https://proceedings.mlr.press/v48/allen-zhua16.html},
  abstract = 	 {We consider the fundamental problem in non-convex optimization of efficiently reaching a stationary point. In contrast to the convex case, in the long history of this basic problem, the only known theoretical results on first-order non-convex optimization remain to be full gradient descent that converges in O(1/\varepsilon) iterations for smooth objectives, and stochastic gradient descent that converges in O(1/\varepsilon^2) iterations for objectives that are sum of smooth functions. We provide the first improvement in this line of research. Our result is based on the variance reduction trick recently introduced to convex optimization, as well as a brand new analysis of variance reduction that is suitable for non-convex optimization. For objectives that are sum of smooth functions, our first-order minibatch stochastic method converges with an O(1/\varepsilon) rate, and is faster than full gradient descent by Ω(n^1/3). We demonstrate the effectiveness of our methods on empirical risk minimizations with non-convex loss functions and training neural nets.}
}

Endnote

%0 Conference Paper
%T Variance Reduction for Faster Non-Convex Optimization
%A Zeyuan Allen-Zhu
%A Elad Hazan
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger	
%F pmlr-v48-allen-zhua16
%I PMLR
%P 699--707
%U https://proceedings.mlr.press/v48/allen-zhua16.html
%V 48
%X We consider the fundamental problem in non-convex optimization of efficiently reaching a stationary point. In contrast to the convex case, in the long history of this basic problem, the only known theoretical results on first-order non-convex optimization remain to be full gradient descent that converges in O(1/\varepsilon) iterations for smooth objectives, and stochastic gradient descent that converges in O(1/\varepsilon^2) iterations for objectives that are sum of smooth functions. We provide the first improvement in this line of research. Our result is based on the variance reduction trick recently introduced to convex optimization, as well as a brand new analysis of variance reduction that is suitable for non-convex optimization. For objectives that are sum of smooth functions, our first-order minibatch stochastic method converges with an O(1/\varepsilon) rate, and is faster than full gradient descent by Ω(n^1/3). We demonstrate the effectiveness of our methods on empirical risk minimizations with non-convex loss functions and training neural nets.

RIS


TY  - CPAPER
TI  - Variance Reduction for Faster Non-Convex Optimization
AU  - Zeyuan Allen-Zhu
AU  - Elad Hazan
BT  - Proceedings of The 33rd International Conference on Machine Learning
DA  - 2016/06/11
ED  - Maria Florina Balcan
ED  - Kilian Q. Weinberger	
ID  - pmlr-v48-allen-zhua16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 48
SP  - 699
EP  - 707
L1  - http://proceedings.mlr.press/v48/allen-zhua16.pdf
UR  - https://proceedings.mlr.press/v48/allen-zhua16.html
AB  - We consider the fundamental problem in non-convex optimization of efficiently reaching a stationary point. In contrast to the convex case, in the long history of this basic problem, the only known theoretical results on first-order non-convex optimization remain to be full gradient descent that converges in O(1/\varepsilon) iterations for smooth objectives, and stochastic gradient descent that converges in O(1/\varepsilon^2) iterations for objectives that are sum of smooth functions. We provide the first improvement in this line of research. Our result is based on the variance reduction trick recently introduced to convex optimization, as well as a brand new analysis of variance reduction that is suitable for non-convex optimization. For objectives that are sum of smooth functions, our first-order minibatch stochastic method converges with an O(1/\varepsilon) rate, and is faster than full gradient descent by Ω(n^1/3). We demonstrate the effectiveness of our methods on empirical risk minimizations with non-convex loss functions and training neural nets.
ER  -

APA


Allen-Zhu, Z. & Hazan, E.. (2016). Variance Reduction for Faster Non-Convex Optimization. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:699-707 Available from https://proceedings.mlr.press/v48/allen-zhua16.html.

Related Material

Download PDF