A Three Sample Hypothesis Test for Evaluating Generative Models


Casey Meehan, Kamalika Chaudhuri, Sanjoy Dasgupta ;
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:3546-3556, 2020.


Detecting overfitting in generative models is an important challenge in machine learning. In this work, we formalize a form of overfitting that we call {\em{data-copying}} – where the generative model memorizes and outputs training samples or small variations thereof. We provide a three sample test for detecting data-copying that uses the training set, a separate sample from the target distribution, and a generated sample from the model, and study the performance of our test on several canonical models and datasets.

Related Material