A Linearly-Convergent Stochastic L-BFGS Algorithm

Philipp Moritz, Robert Nishihara, Michael Jordan
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:249-258, 2016.

Abstract

We propose a new stochastic L-BFGS algorithm and prove a linear convergence rate for strongly convex and smooth functions. Our algorithm draws heavily from a recent stochastic variant of L-BFGS proposed in Byrd et al. (2014) as well as a recent approach to variance reduction for stochastic gradient descent from Johnson and Zhang (2013). We demonstrate experimentally that our algorithm performs well on large-scale convex and non-convex optimization problems, exhibiting linear convergence and rapidly solving the optimization problems to high levels of precision. Furthermore, we show that our algorithm performs well for a wide-range of step sizes, often differing by several orders of magnitude.

Cite this Paper


BibTeX
@InProceedings{pmlr-v51-moritz16, title = {A Linearly-Convergent Stochastic L-BFGS Algorithm}, author = {Moritz, Philipp and Nishihara, Robert and Jordan, Michael}, booktitle = {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics}, pages = {249--258}, year = {2016}, editor = {Gretton, Arthur and Robert, Christian C.}, volume = {51}, series = {Proceedings of Machine Learning Research}, address = {Cadiz, Spain}, month = {09--11 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v51/moritz16.pdf}, url = {https://proceedings.mlr.press/v51/moritz16.html}, abstract = {We propose a new stochastic L-BFGS algorithm and prove a linear convergence rate for strongly convex and smooth functions. Our algorithm draws heavily from a recent stochastic variant of L-BFGS proposed in Byrd et al. (2014) as well as a recent approach to variance reduction for stochastic gradient descent from Johnson and Zhang (2013). We demonstrate experimentally that our algorithm performs well on large-scale convex and non-convex optimization problems, exhibiting linear convergence and rapidly solving the optimization problems to high levels of precision. Furthermore, we show that our algorithm performs well for a wide-range of step sizes, often differing by several orders of magnitude.} }
Endnote
%0 Conference Paper %T A Linearly-Convergent Stochastic L-BFGS Algorithm %A Philipp Moritz %A Robert Nishihara %A Michael Jordan %B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2016 %E Arthur Gretton %E Christian C. Robert %F pmlr-v51-moritz16 %I PMLR %P 249--258 %U https://proceedings.mlr.press/v51/moritz16.html %V 51 %X We propose a new stochastic L-BFGS algorithm and prove a linear convergence rate for strongly convex and smooth functions. Our algorithm draws heavily from a recent stochastic variant of L-BFGS proposed in Byrd et al. (2014) as well as a recent approach to variance reduction for stochastic gradient descent from Johnson and Zhang (2013). We demonstrate experimentally that our algorithm performs well on large-scale convex and non-convex optimization problems, exhibiting linear convergence and rapidly solving the optimization problems to high levels of precision. Furthermore, we show that our algorithm performs well for a wide-range of step sizes, often differing by several orders of magnitude.
RIS
TY - CPAPER TI - A Linearly-Convergent Stochastic L-BFGS Algorithm AU - Philipp Moritz AU - Robert Nishihara AU - Michael Jordan BT - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics DA - 2016/05/02 ED - Arthur Gretton ED - Christian C. Robert ID - pmlr-v51-moritz16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 51 SP - 249 EP - 258 L1 - http://proceedings.mlr.press/v51/moritz16.pdf UR - https://proceedings.mlr.press/v51/moritz16.html AB - We propose a new stochastic L-BFGS algorithm and prove a linear convergence rate for strongly convex and smooth functions. Our algorithm draws heavily from a recent stochastic variant of L-BFGS proposed in Byrd et al. (2014) as well as a recent approach to variance reduction for stochastic gradient descent from Johnson and Zhang (2013). We demonstrate experimentally that our algorithm performs well on large-scale convex and non-convex optimization problems, exhibiting linear convergence and rapidly solving the optimization problems to high levels of precision. Furthermore, we show that our algorithm performs well for a wide-range of step sizes, often differing by several orders of magnitude. ER -
APA
Moritz, P., Nishihara, R. & Jordan, M.. (2016). A Linearly-Convergent Stochastic L-BFGS Algorithm. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:249-258 Available from https://proceedings.mlr.press/v51/moritz16.html.

Related Material