A Three Sample Hypothesis Test for Evaluating Generative Models

Casey Meehan, Kamalika Chaudhuri, Sanjoy Dasgupta
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:3546-3556, 2020.

Abstract

Detecting overfitting in generative models is an important challenge in machine learning. In this work, we formalize a form of overfitting that we call {\em{data-copying}} – where the generative model memorizes and outputs training samples or small variations thereof. We provide a three sample test for detecting data-copying that uses the training set, a separate sample from the target distribution, and a generated sample from the model, and study the performance of our test on several canonical models and datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-meehan20a, title = {A Three Sample Hypothesis Test for Evaluating Generative Models}, author = {Meehan, Casey and Chaudhuri, Kamalika and Dasgupta, Sanjoy}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {3546--3556}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/meehan20a/meehan20a.pdf}, url = {https://proceedings.mlr.press/v108/meehan20a.html}, abstract = {Detecting overfitting in generative models is an important challenge in machine learning. In this work, we formalize a form of overfitting that we call {\em{data-copying}} – where the generative model memorizes and outputs training samples or small variations thereof. We provide a three sample test for detecting data-copying that uses the training set, a separate sample from the target distribution, and a generated sample from the model, and study the performance of our test on several canonical models and datasets.} }
Endnote
%0 Conference Paper %T A Three Sample Hypothesis Test for Evaluating Generative Models %A Casey Meehan %A Kamalika Chaudhuri %A Sanjoy Dasgupta %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-meehan20a %I PMLR %P 3546--3556 %U https://proceedings.mlr.press/v108/meehan20a.html %V 108 %X Detecting overfitting in generative models is an important challenge in machine learning. In this work, we formalize a form of overfitting that we call {\em{data-copying}} – where the generative model memorizes and outputs training samples or small variations thereof. We provide a three sample test for detecting data-copying that uses the training set, a separate sample from the target distribution, and a generated sample from the model, and study the performance of our test on several canonical models and datasets.
APA
Meehan, C., Chaudhuri, K. & Dasgupta, S.. (2020). A Three Sample Hypothesis Test for Evaluating Generative Models. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:3546-3556 Available from https://proceedings.mlr.press/v108/meehan20a.html.

Related Material