Double-Step Alternating Extragradient with Increasing Timescale Separation for Finding Local Minimax Points: Provable Improvements

Kyuwon Kim, Donghwan Kim
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:24059-24093, 2024.

Abstract

In nonconvex-nonconcave minimax optimization, two-timescale gradient methods have shown their potential to find local minimax (optimal) points, provided that the timescale separation between the min and the max player is sufficiently large. However, existing two-timescale variants of gradient descent ascent and extragradient methods face two shortcomings, especially when we search for non-strict local minimax points that are prevalent in modern overparameterized setting. In specific, (1) these methods can be unstable at some non-strict local minimax points even with sufficiently large timescale separation, and even (2) computing a proper amount of timescale separation is infeasible in practice. To remedy these two issues, we propose to incorporate two simple but provably effective schemes, double-step alternating update and increasing timescale separation, into the two-timescale extragradient method, respectively. Under mild conditions, we show that the proposed methods converge to non-strict local minimax points that all existing two-timescale methods fail to converge.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-kim24m, title = {Double-Step Alternating Extragradient with Increasing Timescale Separation for Finding Local Minimax Points: Provable Improvements}, author = {Kim, Kyuwon and Kim, Donghwan}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {24059--24093}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/kim24m/kim24m.pdf}, url = {https://proceedings.mlr.press/v235/kim24m.html}, abstract = {In nonconvex-nonconcave minimax optimization, two-timescale gradient methods have shown their potential to find local minimax (optimal) points, provided that the timescale separation between the min and the max player is sufficiently large. However, existing two-timescale variants of gradient descent ascent and extragradient methods face two shortcomings, especially when we search for non-strict local minimax points that are prevalent in modern overparameterized setting. In specific, (1) these methods can be unstable at some non-strict local minimax points even with sufficiently large timescale separation, and even (2) computing a proper amount of timescale separation is infeasible in practice. To remedy these two issues, we propose to incorporate two simple but provably effective schemes, double-step alternating update and increasing timescale separation, into the two-timescale extragradient method, respectively. Under mild conditions, we show that the proposed methods converge to non-strict local minimax points that all existing two-timescale methods fail to converge.} }
Endnote
%0 Conference Paper %T Double-Step Alternating Extragradient with Increasing Timescale Separation for Finding Local Minimax Points: Provable Improvements %A Kyuwon Kim %A Donghwan Kim %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-kim24m %I PMLR %P 24059--24093 %U https://proceedings.mlr.press/v235/kim24m.html %V 235 %X In nonconvex-nonconcave minimax optimization, two-timescale gradient methods have shown their potential to find local minimax (optimal) points, provided that the timescale separation between the min and the max player is sufficiently large. However, existing two-timescale variants of gradient descent ascent and extragradient methods face two shortcomings, especially when we search for non-strict local minimax points that are prevalent in modern overparameterized setting. In specific, (1) these methods can be unstable at some non-strict local minimax points even with sufficiently large timescale separation, and even (2) computing a proper amount of timescale separation is infeasible in practice. To remedy these two issues, we propose to incorporate two simple but provably effective schemes, double-step alternating update and increasing timescale separation, into the two-timescale extragradient method, respectively. Under mild conditions, we show that the proposed methods converge to non-strict local minimax points that all existing two-timescale methods fail to converge.
APA
Kim, K. & Kim, D.. (2024). Double-Step Alternating Extragradient with Increasing Timescale Separation for Finding Local Minimax Points: Provable Improvements. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:24059-24093 Available from https://proceedings.mlr.press/v235/kim24m.html.

Related Material