Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations

Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Ayush Sekhari, Karthik Sridharan
Proceedings of Thirty Third Conference on Learning Theory, PMLR 125:242-299, 2020.

Abstract

We design an algorithm which finds an $\epsilon$-approximate stationary point (with $\|\nabla F(x)\|\le \epsilon$) using $O(\epsilon^{-3})$ stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed. We prove a lower bound which establishes that this rate is optimal and—surprisingly—that it cannot be improved using stochastic $p$th order methods for any $p\ge 2$, even when the first $p$ derivatives of the objective are Lipschitz. Together, these results characterize the complexity of non-convex stochastic optimization with second-order methods and beyond. Expanding our scope to the oracle complexity of finding $(\epsilon,\gamma)$-approximate second-order stationary points, we establish nearly matching upper and lower bounds for stochastic second-order methods. Our lower bounds here are novel even in the noiseless case.

Cite this Paper


BibTeX
@InProceedings{pmlr-v125-arjevani20a, title = {Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations}, author = {Arjevani, Yossi and Carmon, Yair and Duchi, John C. and Foster, Dylan J. and Sekhari, Ayush and Sridharan, Karthik}, booktitle = {Proceedings of Thirty Third Conference on Learning Theory}, pages = {242--299}, year = {2020}, editor = {Abernethy, Jacob and Agarwal, Shivani}, volume = {125}, series = {Proceedings of Machine Learning Research}, month = {09--12 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v125/arjevani20a/arjevani20a.pdf}, url = {https://proceedings.mlr.press/v125/arjevani20a.html}, abstract = { We design an algorithm which finds an $\epsilon$-approximate stationary point (with $\|\nabla F(x)\|\le \epsilon$) using $O(\epsilon^{-3})$ stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed. We prove a lower bound which establishes that this rate is optimal and—surprisingly—that it cannot be improved using stochastic $p$th order methods for any $p\ge 2$, even when the first $p$ derivatives of the objective are Lipschitz. Together, these results characterize the complexity of non-convex stochastic optimization with second-order methods and beyond. Expanding our scope to the oracle complexity of finding $(\epsilon,\gamma)$-approximate second-order stationary points, we establish nearly matching upper and lower bounds for stochastic second-order methods. Our lower bounds here are novel even in the noiseless case.} }
Endnote
%0 Conference Paper %T Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations %A Yossi Arjevani %A Yair Carmon %A John C. Duchi %A Dylan J. Foster %A Ayush Sekhari %A Karthik Sridharan %B Proceedings of Thirty Third Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2020 %E Jacob Abernethy %E Shivani Agarwal %F pmlr-v125-arjevani20a %I PMLR %P 242--299 %U https://proceedings.mlr.press/v125/arjevani20a.html %V 125 %X We design an algorithm which finds an $\epsilon$-approximate stationary point (with $\|\nabla F(x)\|\le \epsilon$) using $O(\epsilon^{-3})$ stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed. We prove a lower bound which establishes that this rate is optimal and—surprisingly—that it cannot be improved using stochastic $p$th order methods for any $p\ge 2$, even when the first $p$ derivatives of the objective are Lipschitz. Together, these results characterize the complexity of non-convex stochastic optimization with second-order methods and beyond. Expanding our scope to the oracle complexity of finding $(\epsilon,\gamma)$-approximate second-order stationary points, we establish nearly matching upper and lower bounds for stochastic second-order methods. Our lower bounds here are novel even in the noiseless case.
APA
Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Sekhari, A. & Sridharan, K.. (2020). Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations. Proceedings of Thirty Third Conference on Learning Theory, in Proceedings of Machine Learning Research 125:242-299 Available from https://proceedings.mlr.press/v125/arjevani20a.html.

Related Material