Tighter Analysis for ProxSkip

Zhengmian Hu, Heng Huang
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:13469-13496, 2023.

Abstract

In this paper, we provide a tighter analysis for ProxSkip, an algorithm that allows fewer proximal operator computations to solve composite optimization problems. We improve the existing decreasing speed of Lyapunov function from $\mathcal{O}(p^2)$ to $\mathcal{O}(p)$, when $p$, the frequency of the proximal operators is small enough. Our theoretical analysis also reveals the drawbacks of using large step sizes for gradient descent in ProxSkip when the proximal operator part is the bottleneck. Our main motivation comes from the continuous limit in which the original analysis of ProxSkip fails to guarantee convergence when both the step size $\gamma$ and frequency $p$ tend to zero. We construct a counterexample to demonstrate why such counterintuitive behavior occurs for the original analysis and then propose a novel Lyapunov function variant to construct a tighter analysis, avoiding the problem of the old one. Such a new Lyapunov function can be directly extended to many other variants of ProxSkip. When applied to stochastic gradient setup, our analysis leads to an improved proximal operator complexity for SProxSkip from $\mathcal{O}(\sqrt{\frac{1}{\varepsilon\mu^2}}\log(\frac{1}{\varepsilon}))$ to $\mathcal{O}(\sqrt{\kappa}\log(\frac{1}{\varepsilon}))$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-hu23a, title = {Tighter Analysis for {P}rox{S}kip}, author = {Hu, Zhengmian and Huang, Heng}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {13469--13496}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/hu23a/hu23a.pdf}, url = {https://proceedings.mlr.press/v202/hu23a.html}, abstract = {In this paper, we provide a tighter analysis for ProxSkip, an algorithm that allows fewer proximal operator computations to solve composite optimization problems. We improve the existing decreasing speed of Lyapunov function from $\mathcal{O}(p^2)$ to $\mathcal{O}(p)$, when $p$, the frequency of the proximal operators is small enough. Our theoretical analysis also reveals the drawbacks of using large step sizes for gradient descent in ProxSkip when the proximal operator part is the bottleneck. Our main motivation comes from the continuous limit in which the original analysis of ProxSkip fails to guarantee convergence when both the step size $\gamma$ and frequency $p$ tend to zero. We construct a counterexample to demonstrate why such counterintuitive behavior occurs for the original analysis and then propose a novel Lyapunov function variant to construct a tighter analysis, avoiding the problem of the old one. Such a new Lyapunov function can be directly extended to many other variants of ProxSkip. When applied to stochastic gradient setup, our analysis leads to an improved proximal operator complexity for SProxSkip from $\mathcal{O}(\sqrt{\frac{1}{\varepsilon\mu^2}}\log(\frac{1}{\varepsilon}))$ to $\mathcal{O}(\sqrt{\kappa}\log(\frac{1}{\varepsilon}))$.} }
Endnote
%0 Conference Paper %T Tighter Analysis for ProxSkip %A Zhengmian Hu %A Heng Huang %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-hu23a %I PMLR %P 13469--13496 %U https://proceedings.mlr.press/v202/hu23a.html %V 202 %X In this paper, we provide a tighter analysis for ProxSkip, an algorithm that allows fewer proximal operator computations to solve composite optimization problems. We improve the existing decreasing speed of Lyapunov function from $\mathcal{O}(p^2)$ to $\mathcal{O}(p)$, when $p$, the frequency of the proximal operators is small enough. Our theoretical analysis also reveals the drawbacks of using large step sizes for gradient descent in ProxSkip when the proximal operator part is the bottleneck. Our main motivation comes from the continuous limit in which the original analysis of ProxSkip fails to guarantee convergence when both the step size $\gamma$ and frequency $p$ tend to zero. We construct a counterexample to demonstrate why such counterintuitive behavior occurs for the original analysis and then propose a novel Lyapunov function variant to construct a tighter analysis, avoiding the problem of the old one. Such a new Lyapunov function can be directly extended to many other variants of ProxSkip. When applied to stochastic gradient setup, our analysis leads to an improved proximal operator complexity for SProxSkip from $\mathcal{O}(\sqrt{\frac{1}{\varepsilon\mu^2}}\log(\frac{1}{\varepsilon}))$ to $\mathcal{O}(\sqrt{\kappa}\log(\frac{1}{\varepsilon}))$.
APA
Hu, Z. & Huang, H.. (2023). Tighter Analysis for ProxSkip. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:13469-13496 Available from https://proceedings.mlr.press/v202/hu23a.html.

Related Material