Generalized-Smooth Nonconvex Optimization is As Efficient As Smooth Nonconvex Optimization

Ziyi Chen, Yi Zhou, Yingbin Liang, Zhaosong Lu
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:5396-5427, 2023.

Abstract

Various optimal gradient-based algorithms have been developed for smooth nonconvex optimization. However, many nonconvex machine learning problems do not belong to the class of smooth functions and therefore the existing algorithms are sub-optimal. Instead, these problems have been shown to satisfy certain generalized-smooth conditions, which have not been well understood in the existing literature. In this paper, we propose a notion of $\alpha$-symmetric generalized-smoothness that substantially extends the existing notions and covers many important functions such as high-order polynomials and exponential functions. We study the fundamental properties and establish descent lemmas for the functions in this class. Then, to solve such a large class of nonconvex problems, we design a special deterministic normalized gradient descent algorithm that achieves the optimal iteration complexity $\mathcal{O}(\epsilon^{-2})$, and also prove that the popular SPIDER variance reduction algorithm achieves the optimal sample complexity $\mathcal{O}(\epsilon^{-3})$. Our results show that solving generalized-smooth nonconvex problems is as efficient as solving smooth nonconvex problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-chen23ar, title = {Generalized-Smooth Nonconvex Optimization is As Efficient As Smooth Nonconvex Optimization}, author = {Chen, Ziyi and Zhou, Yi and Liang, Yingbin and Lu, Zhaosong}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {5396--5427}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/chen23ar/chen23ar.pdf}, url = {https://proceedings.mlr.press/v202/chen23ar.html}, abstract = {Various optimal gradient-based algorithms have been developed for smooth nonconvex optimization. However, many nonconvex machine learning problems do not belong to the class of smooth functions and therefore the existing algorithms are sub-optimal. Instead, these problems have been shown to satisfy certain generalized-smooth conditions, which have not been well understood in the existing literature. In this paper, we propose a notion of $\alpha$-symmetric generalized-smoothness that substantially extends the existing notions and covers many important functions such as high-order polynomials and exponential functions. We study the fundamental properties and establish descent lemmas for the functions in this class. Then, to solve such a large class of nonconvex problems, we design a special deterministic normalized gradient descent algorithm that achieves the optimal iteration complexity $\mathcal{O}(\epsilon^{-2})$, and also prove that the popular SPIDER variance reduction algorithm achieves the optimal sample complexity $\mathcal{O}(\epsilon^{-3})$. Our results show that solving generalized-smooth nonconvex problems is as efficient as solving smooth nonconvex problems.} }
Endnote
%0 Conference Paper %T Generalized-Smooth Nonconvex Optimization is As Efficient As Smooth Nonconvex Optimization %A Ziyi Chen %A Yi Zhou %A Yingbin Liang %A Zhaosong Lu %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-chen23ar %I PMLR %P 5396--5427 %U https://proceedings.mlr.press/v202/chen23ar.html %V 202 %X Various optimal gradient-based algorithms have been developed for smooth nonconvex optimization. However, many nonconvex machine learning problems do not belong to the class of smooth functions and therefore the existing algorithms are sub-optimal. Instead, these problems have been shown to satisfy certain generalized-smooth conditions, which have not been well understood in the existing literature. In this paper, we propose a notion of $\alpha$-symmetric generalized-smoothness that substantially extends the existing notions and covers many important functions such as high-order polynomials and exponential functions. We study the fundamental properties and establish descent lemmas for the functions in this class. Then, to solve such a large class of nonconvex problems, we design a special deterministic normalized gradient descent algorithm that achieves the optimal iteration complexity $\mathcal{O}(\epsilon^{-2})$, and also prove that the popular SPIDER variance reduction algorithm achieves the optimal sample complexity $\mathcal{O}(\epsilon^{-3})$. Our results show that solving generalized-smooth nonconvex problems is as efficient as solving smooth nonconvex problems.
APA
Chen, Z., Zhou, Y., Liang, Y. & Lu, Z.. (2023). Generalized-Smooth Nonconvex Optimization is As Efficient As Smooth Nonconvex Optimization. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:5396-5427 Available from https://proceedings.mlr.press/v202/chen23ar.html.

Related Material