Non-parametric estimation of Jensen-Shannon Divergence in Generative Adversarial Network training

Mathieu Sinn, Ambrish Rawat
Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:642-651, 2018.

Abstract

Generative Adversarial Networks (GANs) have become a widely popular framework for generative modelling of high-dimensional datasets. However their training is well-known to be difficult. This work presents a rigorous statistical analysis of GANs providing straight-forward explanations for common training pathologies such as vanishing gradients. Furthermore, it proposes a new training objective, Kernel GANs and demonstrates its practical effectiveness on large-scale real-world data sets. A key element in the analysis is the distinction between training with respect to the (unknown) data distribution, and its empirical counterpart. To overcome issues in GAN training, we pursue the idea of smoothing the Jensen-Shannon Divergence (JSD) by incorporating noise in the input distributions of the discriminator. As we show, this effectively leads to an empirical version of the JSD in which the true and the generator densities are replaced by kernel density estimates, which leads to Kernel GANs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v84-sinn18a, title = {Non-parametric estimation of Jensen-Shannon Divergence in Generative Adversarial Network training}, author = {Sinn, Mathieu and Rawat, Ambrish}, booktitle = {Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics}, pages = {642--651}, year = {2018}, editor = {Storkey, Amos and Perez-Cruz, Fernando}, volume = {84}, series = {Proceedings of Machine Learning Research}, month = {09--11 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v84/sinn18a/sinn18a.pdf}, url = {https://proceedings.mlr.press/v84/sinn18a.html}, abstract = {Generative Adversarial Networks (GANs) have become a widely popular framework for generative modelling of high-dimensional datasets. However their training is well-known to be difficult. This work presents a rigorous statistical analysis of GANs providing straight-forward explanations for common training pathologies such as vanishing gradients. Furthermore, it proposes a new training objective, Kernel GANs and demonstrates its practical effectiveness on large-scale real-world data sets. A key element in the analysis is the distinction between training with respect to the (unknown) data distribution, and its empirical counterpart. To overcome issues in GAN training, we pursue the idea of smoothing the Jensen-Shannon Divergence (JSD) by incorporating noise in the input distributions of the discriminator. As we show, this effectively leads to an empirical version of the JSD in which the true and the generator densities are replaced by kernel density estimates, which leads to Kernel GANs. } }
Endnote
%0 Conference Paper %T Non-parametric estimation of Jensen-Shannon Divergence in Generative Adversarial Network training %A Mathieu Sinn %A Ambrish Rawat %B Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2018 %E Amos Storkey %E Fernando Perez-Cruz %F pmlr-v84-sinn18a %I PMLR %P 642--651 %U https://proceedings.mlr.press/v84/sinn18a.html %V 84 %X Generative Adversarial Networks (GANs) have become a widely popular framework for generative modelling of high-dimensional datasets. However their training is well-known to be difficult. This work presents a rigorous statistical analysis of GANs providing straight-forward explanations for common training pathologies such as vanishing gradients. Furthermore, it proposes a new training objective, Kernel GANs and demonstrates its practical effectiveness on large-scale real-world data sets. A key element in the analysis is the distinction between training with respect to the (unknown) data distribution, and its empirical counterpart. To overcome issues in GAN training, we pursue the idea of smoothing the Jensen-Shannon Divergence (JSD) by incorporating noise in the input distributions of the discriminator. As we show, this effectively leads to an empirical version of the JSD in which the true and the generator densities are replaced by kernel density estimates, which leads to Kernel GANs.
APA
Sinn, M. & Rawat, A.. (2018). Non-parametric estimation of Jensen-Shannon Divergence in Generative Adversarial Network training. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 84:642-651 Available from https://proceedings.mlr.press/v84/sinn18a.html.

Related Material