[edit]
Open Problem: Is There a First-Order Method that Only Converges to Local Minimax Optima?
Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:5957-5964, 2023.
Abstract
Can we effectively train a generative adversarial network (GAN), i.e., optimize a minimax problem, similar to classification neural networks, i.e., minimize a function, by gradient methods? Currently, the answer to this question is “No”. Despite the extensive studies, training GANs still remains challenging, and diffusion-based generative models are largely replacing GANs. When training GANs, we not only struggle with finding stationary points, but also suffer from the so-called mode-collapse phenomenon, generating samples that lack diversity compared to the training data. Due to the nature of GANs, mode-collapse is likely to occur when we find an optimal point for the maximin problem, rather than the original minimax problem.This suggests that answering to an open question of whether there exists a first-order method that only converges to (local) optimum of minimax problems can resolve these issues. None of the existing methods possess such a property, neither theoretically nor practically. This is in contrast to standard gradient descent successfully finding local minima. In nonconvex-nonconcave minimax problems, Jin et al. are the first to suggest an appropriate notion of local optimality, taking account of the order of minimization and maximization. They also presented a partial answer to the question above, showing that two-timescale gradient descent ascent converges to strict local minimax optima. However, the convergence to general local minimax optimum was left mostly unexplored, even though non-strict local minimax optima are prevalent. Our recent findings illustrate that it is possible to find some non-strict local minimax optima by a two-timescale extragradient method.This positive result brings new attention to the open question. Furthermore, we wish to revive discussion on the appropriate notion of local minimax optimum. This was initially discussed by Jin et al., but not much thereafter, which we believe is crucial in answering the open question.