Near-optimal Local Convergence of Alternating Gradient Descent-Ascent for Minimax Optimization

Guodong Zhang, Yuanhao Wang, Laurent Lessard, Roger B. Grosse
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:7659-7679, 2022.

Abstract

Smooth minimax games often proceed by simultaneous or alternating gradient updates. Although algorithms with alternating updates are commonly used in practice, the majority of existing theoretical analyses focus on simultaneous algorithms for convenience of analysis. In this paper, we study alternating gradient descent-ascent (Alt-GDA) in minimax games and show that Alt-GDA is superior to its simultaneous counterpart (Sim-GDA) in many settings. We prove that Alt-GDA achieves a near-optimal local convergence rate for strongly convex-strongly concave (SCSC) problems while Sim-GDA converges at a much slower rate. To our knowledge, this is the first result of any setting showing that Alt-GDA converges faster than Sim-GDA by more than a constant. We further adapt the theory of integral quadratic constraints (IQC) and show that Alt-GDA attains the same rate globally for a subclass of SCSC minimax problems. Empirically, we demonstrate that alternating updates speed up GAN training significantly and the use of optimism only helps for simultaneous algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v151-zhang22e, title = { Near-optimal Local Convergence of Alternating Gradient Descent-Ascent for Minimax Optimization }, author = {Zhang, Guodong and Wang, Yuanhao and Lessard, Laurent and Grosse, Roger B.}, booktitle = {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics}, pages = {7659--7679}, year = {2022}, editor = {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel}, volume = {151}, series = {Proceedings of Machine Learning Research}, month = {28--30 Mar}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v151/zhang22e/zhang22e.pdf}, url = {https://proceedings.mlr.press/v151/zhang22e.html}, abstract = { Smooth minimax games often proceed by simultaneous or alternating gradient updates. Although algorithms with alternating updates are commonly used in practice, the majority of existing theoretical analyses focus on simultaneous algorithms for convenience of analysis. In this paper, we study alternating gradient descent-ascent (Alt-GDA) in minimax games and show that Alt-GDA is superior to its simultaneous counterpart (Sim-GDA) in many settings. We prove that Alt-GDA achieves a near-optimal local convergence rate for strongly convex-strongly concave (SCSC) problems while Sim-GDA converges at a much slower rate. To our knowledge, this is the first result of any setting showing that Alt-GDA converges faster than Sim-GDA by more than a constant. We further adapt the theory of integral quadratic constraints (IQC) and show that Alt-GDA attains the same rate globally for a subclass of SCSC minimax problems. Empirically, we demonstrate that alternating updates speed up GAN training significantly and the use of optimism only helps for simultaneous algorithms. } }
Endnote
%0 Conference Paper %T Near-optimal Local Convergence of Alternating Gradient Descent-Ascent for Minimax Optimization %A Guodong Zhang %A Yuanhao Wang %A Laurent Lessard %A Roger B. Grosse %B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2022 %E Gustau Camps-Valls %E Francisco J. R. Ruiz %E Isabel Valera %F pmlr-v151-zhang22e %I PMLR %P 7659--7679 %U https://proceedings.mlr.press/v151/zhang22e.html %V 151 %X Smooth minimax games often proceed by simultaneous or alternating gradient updates. Although algorithms with alternating updates are commonly used in practice, the majority of existing theoretical analyses focus on simultaneous algorithms for convenience of analysis. In this paper, we study alternating gradient descent-ascent (Alt-GDA) in minimax games and show that Alt-GDA is superior to its simultaneous counterpart (Sim-GDA) in many settings. We prove that Alt-GDA achieves a near-optimal local convergence rate for strongly convex-strongly concave (SCSC) problems while Sim-GDA converges at a much slower rate. To our knowledge, this is the first result of any setting showing that Alt-GDA converges faster than Sim-GDA by more than a constant. We further adapt the theory of integral quadratic constraints (IQC) and show that Alt-GDA attains the same rate globally for a subclass of SCSC minimax problems. Empirically, we demonstrate that alternating updates speed up GAN training significantly and the use of optimism only helps for simultaneous algorithms.
APA
Zhang, G., Wang, Y., Lessard, L. & Grosse, R.B.. (2022). Near-optimal Local Convergence of Alternating Gradient Descent-Ascent for Minimax Optimization . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:7659-7679 Available from https://proceedings.mlr.press/v151/zhang22e.html.

Related Material