Position: Graph Matching Systems Deserve Better Benchmarks

Indradyumna Roy, Saswat Meher, Eeshaan Jain, Soumen Chakrabarti, Abir De
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:82131-82150, 2025.

Abstract

Data sets used in recent work on graph similarity scoring and matching tasks suffer from significant limitations. Using Graph Edit Distance (GED) as a showcase, we highlight pervasive issues such as train-test leakage and poor generalization, which have misguided the community’s understanding and assessment of the capabilities of a method or model. These limitations arise, in part, because preparing labeled data is computationally expensive for combinatorial graph problems. We establish some key properties of GED that enable scalable data augmentation for training, and adversarial test set generation. Together, our analysis, experiments and insights establish new, sound guidelines for designing and evaluating future neural networks, and suggest open challenges for future research.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-roy25a, title = {Position: Graph Matching Systems Deserve Better Benchmarks}, author = {Roy, Indradyumna and Meher, Saswat and Jain, Eeshaan and Chakrabarti, Soumen and De, Abir}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {82131--82150}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/roy25a/roy25a.pdf}, url = {https://proceedings.mlr.press/v267/roy25a.html}, abstract = {Data sets used in recent work on graph similarity scoring and matching tasks suffer from significant limitations. Using Graph Edit Distance (GED) as a showcase, we highlight pervasive issues such as train-test leakage and poor generalization, which have misguided the community’s understanding and assessment of the capabilities of a method or model. These limitations arise, in part, because preparing labeled data is computationally expensive for combinatorial graph problems. We establish some key properties of GED that enable scalable data augmentation for training, and adversarial test set generation. Together, our analysis, experiments and insights establish new, sound guidelines for designing and evaluating future neural networks, and suggest open challenges for future research.} }
Endnote
%0 Conference Paper %T Position: Graph Matching Systems Deserve Better Benchmarks %A Indradyumna Roy %A Saswat Meher %A Eeshaan Jain %A Soumen Chakrabarti %A Abir De %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-roy25a %I PMLR %P 82131--82150 %U https://proceedings.mlr.press/v267/roy25a.html %V 267 %X Data sets used in recent work on graph similarity scoring and matching tasks suffer from significant limitations. Using Graph Edit Distance (GED) as a showcase, we highlight pervasive issues such as train-test leakage and poor generalization, which have misguided the community’s understanding and assessment of the capabilities of a method or model. These limitations arise, in part, because preparing labeled data is computationally expensive for combinatorial graph problems. We establish some key properties of GED that enable scalable data augmentation for training, and adversarial test set generation. Together, our analysis, experiments and insights establish new, sound guidelines for designing and evaluating future neural networks, and suggest open challenges for future research.
APA
Roy, I., Meher, S., Jain, E., Chakrabarti, S. & De, A.. (2025). Position: Graph Matching Systems Deserve Better Benchmarks. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:82131-82150 Available from https://proceedings.mlr.press/v267/roy25a.html.

Related Material