Sharper Rates for Separable Minimax and Finite Sum Optimization via Primal-Dual Extragradient Methods

Yujia Jin; Aaron Sidford; Kevin Tian

Sharper Rates for Separable Minimax and Finite Sum Optimization via Primal-Dual Extragradient Methods

Yujia Jin, Aaron Sidford, Kevin Tian

Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:4362-4415, 2022.

Abstract

We design accelerated algorithms with improved rates for several fundamental classes of optimization problems. Our algorithms all build upon techniques related to the analysis of primal-dual extragradient methods via relative Lipschitzness proposed recently by Cohen, Sidford, and Tian ’21. (1) We study separable minimax optimization problems of the form $\min_x \max_y f(x) - g(y) + h(x, y)$, where $f$ and $g$ have smoothness and strong convexity parameters $(L^x, \mu^x)$, $(L^y, \mu^y)$, and h is convex-concave with a $(\Lambda^{xx}, \Lambda^{xy}, \Lambda^{yy})$-blockwise operator norm bounded Hessian. We provide an algorithm using $\tilde{O}(\sqrt{\frac{L^x}{\mu^x}} + \sqrt{\frac{L^y}{\mu^y}} + \frac{\Lambda^{xx}}{\mu^x} + \frac{\Lambda^{xy}}{\sqrt{\mu^x\mu^y}} + \frac{\Lambda^{yy}}{\mu^y})$ gradient queries. Notably, for convex-concave minimax problems with bilinear coupling (e.g. quadratics), where $\Lambda^{xx} = \Lambda^{yy} = 0$, our rate matches a lower bound of Zhang, Hong, and Zhang ’19. (2) We study finite sum optimization problems of the form $\min_x \frac 1 n \sum_{i \in [n]} f_i(x)$, where each $f_i$ is $L_i$-smooth and the overall problem is $\mu$-strongly convex. We provide an algorithm using $\tilde{O}(n + \sum_{i \in [n]} \sqrt{\frac{L_i}{n\mu}} )$ gradient queries. Notably, when the smoothness bounds $\{L_i\}_{i\in[n]}$ are non-uniform, our rate improves upon accelerated SVRG (Lin et al., Frostig et al. ’15) and Katyusha (Allen-Zhu ’17) by up to a $\sqrt{n}$ factor. (3) We generalize our algorithms for minimax and finite sum optimization to solve a natural family of minimax finite sum optimization problems at an accelerated rate, encapsulating both above results up to a logarithmic factor.

Cite this Paper

BibTeX


@InProceedings{pmlr-v178-jin22b,
  title = 	 {Sharper Rates for Separable Minimax and Finite Sum Optimization via Primal-Dual Extragradient Methods},
  author =       {Jin, Yujia and Sidford, Aaron and Tian, Kevin},
  booktitle = 	 {Proceedings of Thirty Fifth Conference on Learning Theory},
  pages = 	 {4362--4415},
  year = 	 {2022},
  editor = 	 {Loh, Po-Ling and Raginsky, Maxim},
  volume = 	 {178},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--05 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v178/jin22b/jin22b.pdf},
  url = 	 {https://proceedings.mlr.press/v178/jin22b.html},
  abstract = 	 {We design accelerated algorithms with improved rates for several fundamental classes of optimization problems. Our algorithms all build upon techniques related to the analysis of primal-dual extragradient methods via relative Lipschitzness proposed recently by Cohen, Sidford, and Tian ’21.  (1) We study separable minimax optimization problems of the form $\min_x \max_y f(x) - g(y) + h(x, y)$, where $f$ and $g$ have smoothness and strong convexity parameters $(L^x, \mu^x)$, $(L^y, \mu^y)$, and h is convex-concave with a $(\Lambda^{xx}, \Lambda^{xy}, \Lambda^{yy})$-blockwise operator norm bounded Hessian. We provide an algorithm using $\tilde{O}(\sqrt{\frac{L^x}{\mu^x}} + \sqrt{\frac{L^y}{\mu^y}} + \frac{\Lambda^{xx}}{\mu^x} + \frac{\Lambda^{xy}}{\sqrt{\mu^x\mu^y}} + \frac{\Lambda^{yy}}{\mu^y})$ gradient queries. Notably, for convex-concave minimax problems with bilinear coupling (e.g. quadratics), where $\Lambda^{xx} = \Lambda^{yy} = 0$, our rate matches a lower bound of Zhang, Hong, and Zhang ’19. (2) We study finite sum optimization problems of the form $\min_x \frac 1 n \sum_{i \in [n]} f_i(x)$, where each $f_i$ is $L_i$-smooth and the overall problem is $\mu$-strongly convex. We provide an algorithm using $\tilde{O}(n + \sum_{i \in [n]}  \sqrt{\frac{L_i}{n\mu}} )$ gradient queries. Notably, when the smoothness bounds $\{L_i\}_{i\in[n]}$ are non-uniform, our rate improves upon accelerated SVRG (Lin et al., Frostig et al. ’15) and Katyusha (Allen-Zhu ’17) by up to a $\sqrt{n}$ factor. (3) We generalize our algorithms for minimax and finite sum optimization to solve a natural family of minimax finite sum optimization problems at an accelerated rate, encapsulating both above results up to a logarithmic factor.}
}

Endnote

%0 Conference Paper
%T Sharper Rates for Separable Minimax and Finite Sum Optimization via Primal-Dual Extragradient Methods
%A Yujia Jin
%A Aaron Sidford
%A Kevin Tian
%B Proceedings of Thirty Fifth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2022
%E Po-Ling Loh
%E Maxim Raginsky	
%F pmlr-v178-jin22b
%I PMLR
%P 4362--4415
%U https://proceedings.mlr.press/v178/jin22b.html
%V 178
%X We design accelerated algorithms with improved rates for several fundamental classes of optimization problems. Our algorithms all build upon techniques related to the analysis of primal-dual extragradient methods via relative Lipschitzness proposed recently by Cohen, Sidford, and Tian ’21.  (1) We study separable minimax optimization problems of the form $\min_x \max_y f(x) - g(y) + h(x, y)$, where $f$ and $g$ have smoothness and strong convexity parameters $(L^x, \mu^x)$, $(L^y, \mu^y)$, and h is convex-concave with a $(\Lambda^{xx}, \Lambda^{xy}, \Lambda^{yy})$-blockwise operator norm bounded Hessian. We provide an algorithm using $\tilde{O}(\sqrt{\frac{L^x}{\mu^x}} + \sqrt{\frac{L^y}{\mu^y}} + \frac{\Lambda^{xx}}{\mu^x} + \frac{\Lambda^{xy}}{\sqrt{\mu^x\mu^y}} + \frac{\Lambda^{yy}}{\mu^y})$ gradient queries. Notably, for convex-concave minimax problems with bilinear coupling (e.g. quadratics), where $\Lambda^{xx} = \Lambda^{yy} = 0$, our rate matches a lower bound of Zhang, Hong, and Zhang ’19. (2) We study finite sum optimization problems of the form $\min_x \frac 1 n \sum_{i \in [n]} f_i(x)$, where each $f_i$ is $L_i$-smooth and the overall problem is $\mu$-strongly convex. We provide an algorithm using $\tilde{O}(n + \sum_{i \in [n]}  \sqrt{\frac{L_i}{n\mu}} )$ gradient queries. Notably, when the smoothness bounds $\{L_i\}_{i\in[n]}$ are non-uniform, our rate improves upon accelerated SVRG (Lin et al., Frostig et al. ’15) and Katyusha (Allen-Zhu ’17) by up to a $\sqrt{n}$ factor. (3) We generalize our algorithms for minimax and finite sum optimization to solve a natural family of minimax finite sum optimization problems at an accelerated rate, encapsulating both above results up to a logarithmic factor.

APA


Jin, Y., Sidford, A. & Tian, K.. (2022). Sharper Rates for Separable Minimax and Finite Sum Optimization via Primal-Dual Extragradient Methods. Proceedings of Thirty Fifth Conference on Learning Theory, in Proceedings of Machine Learning Research 178:4362-4415 Available from https://proceedings.mlr.press/v178/jin22b.html.

Related Material

Download PDF