On the Complexity of Finite-Sum Smooth Optimization under the Polyak–Łojasiewicz Condition

Yunyan Bai, Yuxing Liu, Luo Luo
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:2392-2417, 2024.

Abstract

This paper considers the optimization problem of the form $\min_{{\bf x}\in{\mathbb R}^d} f({\bf x})\triangleq \frac{1}{n}\sum_{i=1}^n f_i({\bf x})$, where $f(\cdot)$ satisfies the Polyak–Łojasiewicz (PL) condition with parameter $\mu$ and $\{f_i(\cdot)\}_{i=1}^n$ is $L$-mean-squared smooth. We show that any gradient method requires at least $\Omega(n+\kappa\sqrt{n}\log(1/\epsilon))$ incremental first-order oracle (IFO) calls to find an $\epsilon$-suboptimal solution, where $\kappa\triangleq L/\mu$ is the condition number of the problem. This result nearly matches upper bounds of IFO complexity for best-known first-order methods. We also study the problem of minimizing the PL function in the distributed setting such that the individuals $f_1(\cdot),…,f_n(\cdot)$ are located on a connected network of $n$ agents. We provide lower bounds of $\Omega(\kappa/\sqrt{\gamma}\log(1/\epsilon))$, $\Omega((\kappa+\tau\kappa/\sqrt{\gamma})\log(1/\epsilon))$ and $\Omega\big(n+\kappa\sqrt{n}\log(1/\epsilon)\big)$ for communication rounds, time cost and local first-order oracle calls respectively, where $\gamma\in(0,1]$ is the spectral gap of the mixing matrix associated with the network and $\tau>0$ is the time cost of per communication round. Furthermore, we propose a decentralized first-order method that nearly matches above lower bounds in expectation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-bai24c, title = {On the Complexity of Finite-Sum Smooth Optimization under the Polyak–{Ł}ojasiewicz Condition}, author = {Bai, Yunyan and Liu, Yuxing and Luo, Luo}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {2392--2417}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/bai24c/bai24c.pdf}, url = {https://proceedings.mlr.press/v235/bai24c.html}, abstract = {This paper considers the optimization problem of the form $\min_{{\bf x}\in{\mathbb R}^d} f({\bf x})\triangleq \frac{1}{n}\sum_{i=1}^n f_i({\bf x})$, where $f(\cdot)$ satisfies the Polyak–Łojasiewicz (PL) condition with parameter $\mu$ and $\{f_i(\cdot)\}_{i=1}^n$ is $L$-mean-squared smooth. We show that any gradient method requires at least $\Omega(n+\kappa\sqrt{n}\log(1/\epsilon))$ incremental first-order oracle (IFO) calls to find an $\epsilon$-suboptimal solution, where $\kappa\triangleq L/\mu$ is the condition number of the problem. This result nearly matches upper bounds of IFO complexity for best-known first-order methods. We also study the problem of minimizing the PL function in the distributed setting such that the individuals $f_1(\cdot),…,f_n(\cdot)$ are located on a connected network of $n$ agents. We provide lower bounds of $\Omega(\kappa/\sqrt{\gamma}\log(1/\epsilon))$, $\Omega((\kappa+\tau\kappa/\sqrt{\gamma})\log(1/\epsilon))$ and $\Omega\big(n+\kappa\sqrt{n}\log(1/\epsilon)\big)$ for communication rounds, time cost and local first-order oracle calls respectively, where $\gamma\in(0,1]$ is the spectral gap of the mixing matrix associated with the network and $\tau>0$ is the time cost of per communication round. Furthermore, we propose a decentralized first-order method that nearly matches above lower bounds in expectation.} }
Endnote
%0 Conference Paper %T On the Complexity of Finite-Sum Smooth Optimization under the Polyak–Łojasiewicz Condition %A Yunyan Bai %A Yuxing Liu %A Luo Luo %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-bai24c %I PMLR %P 2392--2417 %U https://proceedings.mlr.press/v235/bai24c.html %V 235 %X This paper considers the optimization problem of the form $\min_{{\bf x}\in{\mathbb R}^d} f({\bf x})\triangleq \frac{1}{n}\sum_{i=1}^n f_i({\bf x})$, where $f(\cdot)$ satisfies the Polyak–Łojasiewicz (PL) condition with parameter $\mu$ and $\{f_i(\cdot)\}_{i=1}^n$ is $L$-mean-squared smooth. We show that any gradient method requires at least $\Omega(n+\kappa\sqrt{n}\log(1/\epsilon))$ incremental first-order oracle (IFO) calls to find an $\epsilon$-suboptimal solution, where $\kappa\triangleq L/\mu$ is the condition number of the problem. This result nearly matches upper bounds of IFO complexity for best-known first-order methods. We also study the problem of minimizing the PL function in the distributed setting such that the individuals $f_1(\cdot),…,f_n(\cdot)$ are located on a connected network of $n$ agents. We provide lower bounds of $\Omega(\kappa/\sqrt{\gamma}\log(1/\epsilon))$, $\Omega((\kappa+\tau\kappa/\sqrt{\gamma})\log(1/\epsilon))$ and $\Omega\big(n+\kappa\sqrt{n}\log(1/\epsilon)\big)$ for communication rounds, time cost and local first-order oracle calls respectively, where $\gamma\in(0,1]$ is the spectral gap of the mixing matrix associated with the network and $\tau>0$ is the time cost of per communication round. Furthermore, we propose a decentralized first-order method that nearly matches above lower bounds in expectation.
APA
Bai, Y., Liu, Y. & Luo, L.. (2024). On the Complexity of Finite-Sum Smooth Optimization under the Polyak–Łojasiewicz Condition. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:2392-2417 Available from https://proceedings.mlr.press/v235/bai24c.html.

Related Material