On the Lower Bound of Minimizing Polyak-Łojasiewicz functions

Pengyun Yue; Cong Fang; Zhouchen Lin

On the Lower Bound of Minimizing Polyak-Łojasiewicz functions

Pengyun Yue, Cong Fang, Zhouchen Lin

Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:2948-2968, 2023.

Abstract

Polyak-Łojasiewicz (PL) (Polyak, 1963) condition is a weaker condition than the strong convexity but suffices to ensure a global convergence for the Gradient Descent algorithm. In this paper, we study the lower bound of algorithms using first-order oracles to find an approximate optimal solution. We show that any first-order algorithm requires at least

${\Omega}\left(\frac{L}{\mu}\log\frac{1}{\epsilon}\right)$ gradient costs to με find an

$\epsilon$ -approximate optimal solution for a general

$L$ -smooth function that has an

$\mu$ -PL constant. This result demonstrates the optimality of the Gradient Descent algorithm to minimize smooth PL functions in the sense that there exists a “hard” PL function such that no first-order algorithm can be faster than Gradient Descent when ignoring a numerical constant. In contrast, it is well-known that the momentum technique, e.g. Nesterov (2003, chap. 2), can provably accelerate Gradient Descent to

${O}\left(\sqrt{\frac{L}{\hat{\mu}}}\log\frac{1}{\epsilon}\right)$ gradient costs for functions that are

$L$ -smooth and

$\hat{\mu}$ -strongly convex. Therefore, our result distinguishes the hardness of minimizing a smooth PL function and a smooth strongly convex function as the complexity of the former cannot be improved by any polynomial order in general.

Cite this Paper

BibTeX


@InProceedings{pmlr-v195-yue23a,
  title = 	 {On the Lower Bound of Minimizing Polyak-Łojasiewicz functions},
  author =       {Yue, Pengyun and Fang, Cong and Lin, Zhouchen},
  booktitle = 	 {Proceedings of Thirty Sixth Conference on Learning Theory},
  pages = 	 {2948--2968},
  year = 	 {2023},
  editor = 	 {Neu, Gergely and Rosasco, Lorenzo},
  volume = 	 {195},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {12--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v195/yue23a/yue23a.pdf},
  url = 	 {https://proceedings.mlr.press/v195/yue23a.html},
  abstract = 	 {Polyak-Łojasiewicz (PL) (Polyak, 1963) condition is a weaker condition than the strong convexity but suffices to ensure a global convergence for the Gradient Descent algorithm. In this paper, we study the lower bound of algorithms using first-order oracles to find an approximate optimal solution. We show that any first-order algorithm requires at least ${\Omega}\left(\frac{L}{\mu}\log\frac{1}{\epsilon}\right)$ gradient costs to με find an $\epsilon$-approximate optimal solution for a general $L$-smooth function that has an $\mu$-PL constant. This result demonstrates the optimality of the Gradient Descent algorithm to minimize smooth PL functions in the sense that there exists a “hard” PL function such that no first-order algorithm can be faster than Gradient Descent when ignoring a numerical constant. In contrast, it is well-known that the momentum technique, e.g. Nesterov (2003, chap. 2), can provably accelerate Gradient Descent to ${O}\left(\sqrt{\frac{L}{\hat{\mu}}}\log\frac{1}{\epsilon}\right)$ gradient costs for functions that are $L$-smooth and $\hat{\mu}$-strongly convex. Therefore, our result distinguishes the hardness of minimizing a smooth PL function and a smooth strongly convex function as the complexity of the former cannot be improved by any polynomial order in general.}
}

Endnote

%0 Conference Paper
%T On the Lower Bound of Minimizing Polyak-Łojasiewicz functions
%A Pengyun Yue
%A Cong Fang
%A Zhouchen Lin
%B Proceedings of Thirty Sixth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2023
%E Gergely Neu
%E Lorenzo Rosasco	
%F pmlr-v195-yue23a
%I PMLR
%P 2948--2968
%U https://proceedings.mlr.press/v195/yue23a.html
%V 195
%X Polyak-Łojasiewicz (PL) (Polyak, 1963) condition is a weaker condition than the strong convexity but suffices to ensure a global convergence for the Gradient Descent algorithm. In this paper, we study the lower bound of algorithms using first-order oracles to find an approximate optimal solution. We show that any first-order algorithm requires at least ${\Omega}\left(\frac{L}{\mu}\log\frac{1}{\epsilon}\right)$ gradient costs to με find an $\epsilon$-approximate optimal solution for a general $L$-smooth function that has an $\mu$-PL constant. This result demonstrates the optimality of the Gradient Descent algorithm to minimize smooth PL functions in the sense that there exists a “hard” PL function such that no first-order algorithm can be faster than Gradient Descent when ignoring a numerical constant. In contrast, it is well-known that the momentum technique, e.g. Nesterov (2003, chap. 2), can provably accelerate Gradient Descent to ${O}\left(\sqrt{\frac{L}{\hat{\mu}}}\log\frac{1}{\epsilon}\right)$ gradient costs for functions that are $L$-smooth and $\hat{\mu}$-strongly convex. Therefore, our result distinguishes the hardness of minimizing a smooth PL function and a smooth strongly convex function as the complexity of the former cannot be improved by any polynomial order in general.

APA


Yue, P., Fang, C. & Lin, Z.. (2023). On the Lower Bound of Minimizing Polyak-Łojasiewicz functions. Proceedings of Thirty Sixth Conference on Learning Theory, in Proceedings of Machine Learning Research 195:2948-2968 Available from https://proceedings.mlr.press/v195/yue23a.html.

Related Material

Download PDF