Sub-sampled Cubic Regularization for Non-convex Optimization

Jonas Moritz Kohler, Aurelien Lucchi
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:1895-1904, 2017.

Abstract

We consider the minimization of non-convex functions that typically arise in machine learning. Specifically, we focus our attention on a variant of trust region methods known as cubic regularization. This approach is particularly attractive because it escapes strict saddle points and it provides stronger convergence guarantees than first- and second-order as well as classical trust region methods. However, it suffers from a high computational complexity that makes it impractical for large-scale learning. Here, we propose a novel method that uses sub-sampling to lower this computational cost. By the use of concentration inequalities we provide a sampling scheme that gives sufficiently accurate gradient and Hessian approximations to retain the strong global and local convergence guarantees of cubically regularized methods. To the best of our knowledge this is the first work that gives global convergence guarantees for a sub-sampled variant of cubic regularization on non-convex functions. Furthermore, we provide experimental results supporting our theory.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-kohler17a, title = {Sub-sampled Cubic Regularization for Non-convex Optimization}, author = {Jonas Moritz Kohler and Aurelien Lucchi}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {1895--1904}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/kohler17a/kohler17a.pdf}, url = {https://proceedings.mlr.press/v70/kohler17a.html}, abstract = {We consider the minimization of non-convex functions that typically arise in machine learning. Specifically, we focus our attention on a variant of trust region methods known as cubic regularization. This approach is particularly attractive because it escapes strict saddle points and it provides stronger convergence guarantees than first- and second-order as well as classical trust region methods. However, it suffers from a high computational complexity that makes it impractical for large-scale learning. Here, we propose a novel method that uses sub-sampling to lower this computational cost. By the use of concentration inequalities we provide a sampling scheme that gives sufficiently accurate gradient and Hessian approximations to retain the strong global and local convergence guarantees of cubically regularized methods. To the best of our knowledge this is the first work that gives global convergence guarantees for a sub-sampled variant of cubic regularization on non-convex functions. Furthermore, we provide experimental results supporting our theory.} }
Endnote
%0 Conference Paper %T Sub-sampled Cubic Regularization for Non-convex Optimization %A Jonas Moritz Kohler %A Aurelien Lucchi %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-kohler17a %I PMLR %P 1895--1904 %U https://proceedings.mlr.press/v70/kohler17a.html %V 70 %X We consider the minimization of non-convex functions that typically arise in machine learning. Specifically, we focus our attention on a variant of trust region methods known as cubic regularization. This approach is particularly attractive because it escapes strict saddle points and it provides stronger convergence guarantees than first- and second-order as well as classical trust region methods. However, it suffers from a high computational complexity that makes it impractical for large-scale learning. Here, we propose a novel method that uses sub-sampling to lower this computational cost. By the use of concentration inequalities we provide a sampling scheme that gives sufficiently accurate gradient and Hessian approximations to retain the strong global and local convergence guarantees of cubically regularized methods. To the best of our knowledge this is the first work that gives global convergence guarantees for a sub-sampled variant of cubic regularization on non-convex functions. Furthermore, we provide experimental results supporting our theory.
APA
Kohler, J.M. & Lucchi, A.. (2017). Sub-sampled Cubic Regularization for Non-convex Optimization. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:1895-1904 Available from https://proceedings.mlr.press/v70/kohler17a.html.

Related Material