Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations

Yossi Arjevani; Yair Carmon; John C. Duchi; Dylan J. Foster; Ayush Sekhari; Karthik Sridharan

Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations

Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Ayush Sekhari, Karthik Sridharan

Proceedings of Thirty Third Conference on Learning Theory, PMLR 125:242-299, 2020.

Abstract

We design an algorithm which finds an

$\epsilon$ -approximate stationary point (with

$\|\nabla F(x)\|\le \epsilon$ ) using

$O(\epsilon^{-3})$ stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed. We prove a lower bound which establishes that this rate is optimal and—surprisingly—that it cannot be improved using stochastic

$p$ th order methods for any

$p\ge 2$ , even when the first

$p$ derivatives of the objective are Lipschitz. Together, these results characterize the complexity of non-convex stochastic optimization with second-order methods and beyond. Expanding our scope to the oracle complexity of finding

$(\epsilon,\gamma)$ -approximate second-order stationary points, we establish nearly matching upper and lower bounds for stochastic second-order methods. Our lower bounds here are novel even in the noiseless case.

Cite this Paper

BibTeX


@InProceedings{pmlr-v125-arjevani20a,
  title = 	 {Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations},
  author =       {Arjevani, Yossi and Carmon, Yair and Duchi, John C. and Foster, Dylan J. and Sekhari, Ayush and Sridharan, Karthik},
  booktitle = 	 {Proceedings of Thirty Third Conference on Learning Theory},
  pages = 	 {242--299},
  year = 	 {2020},
  editor = 	 {Abernethy, Jacob and Agarwal, Shivani},
  volume = 	 {125},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--12 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v125/arjevani20a/arjevani20a.pdf},
  url = 	 {https://proceedings.mlr.press/v125/arjevani20a.html},
  abstract = 	 { We design an algorithm which finds an $\epsilon$-approximate stationary point (with $\|\nabla F(x)\|\le \epsilon$) using $O(\epsilon^{-3})$ stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed. We prove a lower bound which establishes that this rate is optimal and—surprisingly—that it cannot be improved using stochastic $p$th order methods for any $p\ge 2$, even when the first $p$ derivatives of the objective are Lipschitz. Together, these results characterize the complexity of non-convex stochastic optimization with second-order methods and beyond. Expanding our scope to the oracle complexity of finding $(\epsilon,\gamma)$-approximate second-order stationary points, we establish nearly matching upper and lower bounds for stochastic second-order methods. Our lower bounds here are novel even in the noiseless case.}
}

Endnote

%0 Conference Paper
%T Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations
%A Yossi Arjevani
%A Yair Carmon
%A John C. Duchi
%A Dylan J. Foster
%A Ayush Sekhari
%A Karthik Sridharan
%B Proceedings of Thirty Third Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2020
%E Jacob Abernethy
%E Shivani Agarwal	
%F pmlr-v125-arjevani20a
%I PMLR
%P 242--299
%U https://proceedings.mlr.press/v125/arjevani20a.html
%V 125
%X  We design an algorithm which finds an $\epsilon$-approximate stationary point (with $\|\nabla F(x)\|\le \epsilon$) using $O(\epsilon^{-3})$ stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed. We prove a lower bound which establishes that this rate is optimal and—surprisingly—that it cannot be improved using stochastic $p$th order methods for any $p\ge 2$, even when the first $p$ derivatives of the objective are Lipschitz. Together, these results characterize the complexity of non-convex stochastic optimization with second-order methods and beyond. Expanding our scope to the oracle complexity of finding $(\epsilon,\gamma)$-approximate second-order stationary points, we establish nearly matching upper and lower bounds for stochastic second-order methods. Our lower bounds here are novel even in the noiseless case.

APA


Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Sekhari, A. & Sridharan, K.. (2020). Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations. Proceedings of Thirty Third Conference on Learning Theory, in Proceedings of Machine Learning Research 125:242-299 Available from https://proceedings.mlr.press/v125/arjevani20a.html.

Related Material

Download PDF