Leveraging Non-uniformity in First-order Non-convex Optimization

Jincheng Mei, Yue Gao, Bo Dai, Csaba Szepesvari, Dale Schuurmans
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:7555-7564, 2021.

Abstract

Classical global convergence results for first-order methods rely on uniform smoothness and the Ł{}ojasiewicz inequality. Motivated by properties of objective functions that arise in machine learning, we propose a non-uniform refinement of these notions, leading to \emph{Non-uniform Smoothness} (NS) and \emph{Non-uniform Ł{}ojasiewicz inequality} (NŁ{}). The new definitions inspire new geometry-aware first-order methods that are able to converge to global optimality faster than the classical $\Omega(1/t^2)$ lower bounds. To illustrate the power of these geometry-aware methods and their corresponding non-uniform analysis, we consider two important problems in machine learning: policy gradient optimization in reinforcement learning (PG), and generalized linear model training in supervised learning (GLM). For PG, we find that normalizing the gradient ascent method can accelerate convergence to $O(e^{- c \cdot t})$ (where $c > 0$) while incurring less overhead than existing algorithms. For GLM, we show that geometry-aware normalized gradient descent can also achieve a linear convergence rate, which significantly improves the best known results. We additionally show that the proposed geometry-aware gradient descent methods escape landscape plateaus faster than standard gradient descent. Experimental results are used to illustrate and complement the theoretical findings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-mei21a, title = {Leveraging Non-uniformity in First-order Non-convex Optimization}, author = {Mei, Jincheng and Gao, Yue and Dai, Bo and Szepesvari, Csaba and Schuurmans, Dale}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {7555--7564}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/mei21a/mei21a.pdf}, url = {https://proceedings.mlr.press/v139/mei21a.html}, abstract = {Classical global convergence results for first-order methods rely on uniform smoothness and the Ł{}ojasiewicz inequality. Motivated by properties of objective functions that arise in machine learning, we propose a non-uniform refinement of these notions, leading to \emph{Non-uniform Smoothness} (NS) and \emph{Non-uniform Ł{}ojasiewicz inequality} (NŁ{}). The new definitions inspire new geometry-aware first-order methods that are able to converge to global optimality faster than the classical $\Omega(1/t^2)$ lower bounds. To illustrate the power of these geometry-aware methods and their corresponding non-uniform analysis, we consider two important problems in machine learning: policy gradient optimization in reinforcement learning (PG), and generalized linear model training in supervised learning (GLM). For PG, we find that normalizing the gradient ascent method can accelerate convergence to $O(e^{- c \cdot t})$ (where $c > 0$) while incurring less overhead than existing algorithms. For GLM, we show that geometry-aware normalized gradient descent can also achieve a linear convergence rate, which significantly improves the best known results. We additionally show that the proposed geometry-aware gradient descent methods escape landscape plateaus faster than standard gradient descent. Experimental results are used to illustrate and complement the theoretical findings.} }
Endnote
%0 Conference Paper %T Leveraging Non-uniformity in First-order Non-convex Optimization %A Jincheng Mei %A Yue Gao %A Bo Dai %A Csaba Szepesvari %A Dale Schuurmans %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-mei21a %I PMLR %P 7555--7564 %U https://proceedings.mlr.press/v139/mei21a.html %V 139 %X Classical global convergence results for first-order methods rely on uniform smoothness and the Ł{}ojasiewicz inequality. Motivated by properties of objective functions that arise in machine learning, we propose a non-uniform refinement of these notions, leading to \emph{Non-uniform Smoothness} (NS) and \emph{Non-uniform Ł{}ojasiewicz inequality} (NŁ{}). The new definitions inspire new geometry-aware first-order methods that are able to converge to global optimality faster than the classical $\Omega(1/t^2)$ lower bounds. To illustrate the power of these geometry-aware methods and their corresponding non-uniform analysis, we consider two important problems in machine learning: policy gradient optimization in reinforcement learning (PG), and generalized linear model training in supervised learning (GLM). For PG, we find that normalizing the gradient ascent method can accelerate convergence to $O(e^{- c \cdot t})$ (where $c > 0$) while incurring less overhead than existing algorithms. For GLM, we show that geometry-aware normalized gradient descent can also achieve a linear convergence rate, which significantly improves the best known results. We additionally show that the proposed geometry-aware gradient descent methods escape landscape plateaus faster than standard gradient descent. Experimental results are used to illustrate and complement the theoretical findings.
APA
Mei, J., Gao, Y., Dai, B., Szepesvari, C. & Schuurmans, D.. (2021). Leveraging Non-uniformity in First-order Non-convex Optimization. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:7555-7564 Available from https://proceedings.mlr.press/v139/mei21a.html.

Related Material