Vertical Validation: Evaluating Implicit Generative Models for Graphs on Thin Support Regions

Mai Elkady, Thu Bui, Bruno Ribeiro, David Inouye
Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, PMLR 244:1232-1256, 2024.

Abstract

There has been a growing excitement that implicit graph generative models could be used to design or discover new molecules for medicine or material design. Because these molecules have not been discovered, they naturally lie in unexplored or scarcely supported regions of the distribution of known molecules. However, prior evaluation methods for implicit graph generative models have focused on validating statistics computed from the thick support (e.g., mean and variance of a graph property). Therefore, there is a mismatch between the goal of generating novel graphs and the evaluation methods. To address this evaluation gap, we design a novel evaluation method called Vertical Validation (VV) that systematically creates thin support regions during the train-test splitting procedure and then reweights generated samples so that they can be compared to the held-out test data. This procedure can be seen as a generalization of the standard train-test procedure except that the splits are dependent on sample features. We demonstrate that our method can be used to perform model selection if performance on thin support regions is the desired goal. As a side benefit, we also show that our approach can better detect overfitting as exemplified by memorization.

Cite this Paper


BibTeX
@InProceedings{pmlr-v244-elkady24a, title = {Vertical Validation: Evaluating Implicit Generative Models for Graphs on Thin Support Regions}, author = {Elkady, Mai and Bui, Thu and Ribeiro, Bruno and Inouye, David}, booktitle = {Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence}, pages = {1232--1256}, year = {2024}, editor = {Kiyavash, Negar and Mooij, Joris M.}, volume = {244}, series = {Proceedings of Machine Learning Research}, month = {15--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v244/main/assets/elkady24a/elkady24a.pdf}, url = {https://proceedings.mlr.press/v244/elkady24a.html}, abstract = {There has been a growing excitement that implicit graph generative models could be used to design or discover new molecules for medicine or material design. Because these molecules have not been discovered, they naturally lie in unexplored or scarcely supported regions of the distribution of known molecules. However, prior evaluation methods for implicit graph generative models have focused on validating statistics computed from the thick support (e.g., mean and variance of a graph property). Therefore, there is a mismatch between the goal of generating novel graphs and the evaluation methods. To address this evaluation gap, we design a novel evaluation method called Vertical Validation (VV) that systematically creates thin support regions during the train-test splitting procedure and then reweights generated samples so that they can be compared to the held-out test data. This procedure can be seen as a generalization of the standard train-test procedure except that the splits are dependent on sample features. We demonstrate that our method can be used to perform model selection if performance on thin support regions is the desired goal. As a side benefit, we also show that our approach can better detect overfitting as exemplified by memorization.} }
Endnote
%0 Conference Paper %T Vertical Validation: Evaluating Implicit Generative Models for Graphs on Thin Support Regions %A Mai Elkady %A Thu Bui %A Bruno Ribeiro %A David Inouye %B Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2024 %E Negar Kiyavash %E Joris M. Mooij %F pmlr-v244-elkady24a %I PMLR %P 1232--1256 %U https://proceedings.mlr.press/v244/elkady24a.html %V 244 %X There has been a growing excitement that implicit graph generative models could be used to design or discover new molecules for medicine or material design. Because these molecules have not been discovered, they naturally lie in unexplored or scarcely supported regions of the distribution of known molecules. However, prior evaluation methods for implicit graph generative models have focused on validating statistics computed from the thick support (e.g., mean and variance of a graph property). Therefore, there is a mismatch between the goal of generating novel graphs and the evaluation methods. To address this evaluation gap, we design a novel evaluation method called Vertical Validation (VV) that systematically creates thin support regions during the train-test splitting procedure and then reweights generated samples so that they can be compared to the held-out test data. This procedure can be seen as a generalization of the standard train-test procedure except that the splits are dependent on sample features. We demonstrate that our method can be used to perform model selection if performance on thin support regions is the desired goal. As a side benefit, we also show that our approach can better detect overfitting as exemplified by memorization.
APA
Elkady, M., Bui, T., Ribeiro, B. & Inouye, D.. (2024). Vertical Validation: Evaluating Implicit Generative Models for Graphs on Thin Support Regions. Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 244:1232-1256 Available from https://proceedings.mlr.press/v244/elkady24a.html.

Related Material