Adaptive Gradient Descent without Descent

Yura Malitsky, Konstantin Mishchenko
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:6702-6712, 2020.

Abstract

We present a strikingly simple proof that two rules are sufficient to automate gradient descent: 1) don’t increase the stepsize too fast and 2) don’t overstep the local curvature. No need for functional values, no line search, no information about the function except for the gradients. By following these rules, you get a method adaptive to the local geometry, with convergence guarantees depending only on the smoothness in a neighborhood of a solution. Given that the problem is convex, our method converges even if the global smoothness constant is infinity. As an illustration, it can minimize arbitrary continuously twice-differentiable convex function. We examine its performance on a range of convex and nonconvex problems, including logistic regression and matrix factorization.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-malitsky20a, title = {Adaptive Gradient Descent without Descent}, author = {Malitsky, Yura and Mishchenko, Konstantin}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {6702--6712}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/malitsky20a/malitsky20a.pdf}, url = {https://proceedings.mlr.press/v119/malitsky20a.html}, abstract = {We present a strikingly simple proof that two rules are sufficient to automate gradient descent: 1) don’t increase the stepsize too fast and 2) don’t overstep the local curvature. No need for functional values, no line search, no information about the function except for the gradients. By following these rules, you get a method adaptive to the local geometry, with convergence guarantees depending only on the smoothness in a neighborhood of a solution. Given that the problem is convex, our method converges even if the global smoothness constant is infinity. As an illustration, it can minimize arbitrary continuously twice-differentiable convex function. We examine its performance on a range of convex and nonconvex problems, including logistic regression and matrix factorization.} }
Endnote
%0 Conference Paper %T Adaptive Gradient Descent without Descent %A Yura Malitsky %A Konstantin Mishchenko %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-malitsky20a %I PMLR %P 6702--6712 %U https://proceedings.mlr.press/v119/malitsky20a.html %V 119 %X We present a strikingly simple proof that two rules are sufficient to automate gradient descent: 1) don’t increase the stepsize too fast and 2) don’t overstep the local curvature. No need for functional values, no line search, no information about the function except for the gradients. By following these rules, you get a method adaptive to the local geometry, with convergence guarantees depending only on the smoothness in a neighborhood of a solution. Given that the problem is convex, our method converges even if the global smoothness constant is infinity. As an illustration, it can minimize arbitrary continuously twice-differentiable convex function. We examine its performance on a range of convex and nonconvex problems, including logistic regression and matrix factorization.
APA
Malitsky, Y. & Mishchenko, K.. (2020). Adaptive Gradient Descent without Descent. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:6702-6712 Available from https://proceedings.mlr.press/v119/malitsky20a.html.

Related Material