On Convergence in Wasserstein Distance and f-divergence Minimization Problems

Cheuk Ting Li, Jingwei Zhang, Farzan Farnia
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:2062-2070, 2024.

Abstract

The zero-sum game in generative adversarial networks (GANs) for learning the distribution of observed data is known to reduce to the minimization of a divergence measure between the underlying and generative models. However, the current theoretical understanding of the role of the target divergence in the characteristics of GANs’ generated samples remains largely inadequate. In this work, we aim to analyze the influence of the divergence measure on the local optima and convergence properties of divergence minimization problems in learning a multi-modal data distribution. We show a mode-seeking f-divergence, e.g. the Jensen-Shannon (JS) divergence in the vanilla GAN, could lead to poor locally optimal solutions missing some underlying modes. On the other hand, we demonstrate that the optimization landscape of 1-Wasserstein distance in Wasserstein GANs does not suffer from such suboptimal local minima. Furthermore, we prove that a randomly-initialized gradient-based optimization of the Wasserstein distance will, with high probability, capture all the existing modes. We present numerical results on standard image datasets, revealing the success of Wasserstein GANs compared to JS-GANs in avoiding suboptimal local optima under a mixture model.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-ting-li24a, title = { On Convergence in {W}asserstein Distance and f-divergence Minimization Problems }, author = {Ting Li, Cheuk and Zhang, Jingwei and Farnia, Farzan}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {2062--2070}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/ting-li24a/ting-li24a.pdf}, url = {https://proceedings.mlr.press/v238/ting-li24a.html}, abstract = { The zero-sum game in generative adversarial networks (GANs) for learning the distribution of observed data is known to reduce to the minimization of a divergence measure between the underlying and generative models. However, the current theoretical understanding of the role of the target divergence in the characteristics of GANs’ generated samples remains largely inadequate. In this work, we aim to analyze the influence of the divergence measure on the local optima and convergence properties of divergence minimization problems in learning a multi-modal data distribution. We show a mode-seeking f-divergence, e.g. the Jensen-Shannon (JS) divergence in the vanilla GAN, could lead to poor locally optimal solutions missing some underlying modes. On the other hand, we demonstrate that the optimization landscape of 1-Wasserstein distance in Wasserstein GANs does not suffer from such suboptimal local minima. Furthermore, we prove that a randomly-initialized gradient-based optimization of the Wasserstein distance will, with high probability, capture all the existing modes. We present numerical results on standard image datasets, revealing the success of Wasserstein GANs compared to JS-GANs in avoiding suboptimal local optima under a mixture model. } }
Endnote
%0 Conference Paper %T On Convergence in Wasserstein Distance and f-divergence Minimization Problems %A Cheuk Ting Li %A Jingwei Zhang %A Farzan Farnia %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-ting-li24a %I PMLR %P 2062--2070 %U https://proceedings.mlr.press/v238/ting-li24a.html %V 238 %X The zero-sum game in generative adversarial networks (GANs) for learning the distribution of observed data is known to reduce to the minimization of a divergence measure between the underlying and generative models. However, the current theoretical understanding of the role of the target divergence in the characteristics of GANs’ generated samples remains largely inadequate. In this work, we aim to analyze the influence of the divergence measure on the local optima and convergence properties of divergence minimization problems in learning a multi-modal data distribution. We show a mode-seeking f-divergence, e.g. the Jensen-Shannon (JS) divergence in the vanilla GAN, could lead to poor locally optimal solutions missing some underlying modes. On the other hand, we demonstrate that the optimization landscape of 1-Wasserstein distance in Wasserstein GANs does not suffer from such suboptimal local minima. Furthermore, we prove that a randomly-initialized gradient-based optimization of the Wasserstein distance will, with high probability, capture all the existing modes. We present numerical results on standard image datasets, revealing the success of Wasserstein GANs compared to JS-GANs in avoiding suboptimal local optima under a mixture model.
APA
Ting Li, C., Zhang, J. & Farnia, F.. (2024). On Convergence in Wasserstein Distance and f-divergence Minimization Problems . Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:2062-2070 Available from https://proceedings.mlr.press/v238/ting-li24a.html.

Related Material