Precision Recall Cover: A Method For Assessing Generative Models

Fasil Cheema, Ruth Urner
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:6571-6594, 2023.

Abstract

Generative modelling has seen enormous practical advances over the past few years. Evaluating the quality of a generative system however is often still based on subjective human inspection. To overcome this, very recently the research community has turned to exploring formal evaluation metrics and methods. In this work, we propose a novel evaluation paradigm based on a two way nearest neighbor neighborhood test. We define a novel measure of mutual coverage for two continuous probability distributions. From this, we derive an empirical analogue and show analytically that it exhibits favorable theoretical properties while it is also straightforward to compute. We show that, while algorithmically simple, our derived method is also statistically sound. In contrast to previously employed distance measures, our measure naturally stems from a notion of local discrepancy, which can be accessed separately. This provides more detailed information to practitioners on the diagnosis of where their generative models will perform well, or conversely where their models fail. We complement our analysis with a systematic experimental evaluation and comparison to other recently proposed measures. Using a wide array of experiments we demonstrate our algorithms strengths over other existing methods and confirm our results from the theoretical analysis.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-cheema23a, title = {Precision Recall Cover: A Method For Assessing Generative Models}, author = {Cheema, Fasil and Urner, Ruth}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {6571--6594}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/cheema23a/cheema23a.pdf}, url = {https://proceedings.mlr.press/v206/cheema23a.html}, abstract = {Generative modelling has seen enormous practical advances over the past few years. Evaluating the quality of a generative system however is often still based on subjective human inspection. To overcome this, very recently the research community has turned to exploring formal evaluation metrics and methods. In this work, we propose a novel evaluation paradigm based on a two way nearest neighbor neighborhood test. We define a novel measure of mutual coverage for two continuous probability distributions. From this, we derive an empirical analogue and show analytically that it exhibits favorable theoretical properties while it is also straightforward to compute. We show that, while algorithmically simple, our derived method is also statistically sound. In contrast to previously employed distance measures, our measure naturally stems from a notion of local discrepancy, which can be accessed separately. This provides more detailed information to practitioners on the diagnosis of where their generative models will perform well, or conversely where their models fail. We complement our analysis with a systematic experimental evaluation and comparison to other recently proposed measures. Using a wide array of experiments we demonstrate our algorithms strengths over other existing methods and confirm our results from the theoretical analysis.} }
Endnote
%0 Conference Paper %T Precision Recall Cover: A Method For Assessing Generative Models %A Fasil Cheema %A Ruth Urner %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-cheema23a %I PMLR %P 6571--6594 %U https://proceedings.mlr.press/v206/cheema23a.html %V 206 %X Generative modelling has seen enormous practical advances over the past few years. Evaluating the quality of a generative system however is often still based on subjective human inspection. To overcome this, very recently the research community has turned to exploring formal evaluation metrics and methods. In this work, we propose a novel evaluation paradigm based on a two way nearest neighbor neighborhood test. We define a novel measure of mutual coverage for two continuous probability distributions. From this, we derive an empirical analogue and show analytically that it exhibits favorable theoretical properties while it is also straightforward to compute. We show that, while algorithmically simple, our derived method is also statistically sound. In contrast to previously employed distance measures, our measure naturally stems from a notion of local discrepancy, which can be accessed separately. This provides more detailed information to practitioners on the diagnosis of where their generative models will perform well, or conversely where their models fail. We complement our analysis with a systematic experimental evaluation and comparison to other recently proposed measures. Using a wide array of experiments we demonstrate our algorithms strengths over other existing methods and confirm our results from the theoretical analysis.
APA
Cheema, F. & Urner, R.. (2023). Precision Recall Cover: A Method For Assessing Generative Models. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:6571-6594 Available from https://proceedings.mlr.press/v206/cheema23a.html.

Related Material