Complete-Tree Space Favors Data-Efficient Link Prediction

Chi Gao, Lukai Li, Yancheng Zhou, Shangqi Guo
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:18410-18429, 2025.

Abstract

Link prediction is a fundamental problem for network-structured data. However, the prevalent research paradigm tends to assume abundant observed links, overlooking more challenging scenarios with a scarcity of observed links, which results in insufficient sample sizes. In real-world networks, hierarchical modularity, characterized by structured and nested connections, remains robust even with sparsely observed links. To address the challenge of limited link samples, we propose leveraging hierarchical modularity as a prior structure. We introduce complete-tree (CT) space, a discrete metric space with latent complete-tree structures, to formalize hierarchical modularity with an emphasis on its hierarchical permutation symmetry. Utilizing the group theory to quantize and compare permutation symmetries of different spaces, we prove that the CT space provides a significantly lower bound on sample complexity than the commonly used Euclidean space. We develop leaf matching, a data-efficient network embedding that maps nodes onto the CT space and conducts discrete optimization by reducing it to decentralized search. Experiments verify the data efficiency of CT space over other spaces. Moreover, leaf matching outperforms the state-of-the-art graph transformer in data-scarce scenarios while exhibiting excellent scalability. The code is available at: https://github.com/KevinGao7/LeafMatching.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-gao25g, title = {Complete-Tree Space Favors Data-Efficient Link Prediction}, author = {Gao, Chi and Li, Lukai and Zhou, Yancheng and Guo, Shangqi}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {18410--18429}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/gao25g/gao25g.pdf}, url = {https://proceedings.mlr.press/v267/gao25g.html}, abstract = {Link prediction is a fundamental problem for network-structured data. However, the prevalent research paradigm tends to assume abundant observed links, overlooking more challenging scenarios with a scarcity of observed links, which results in insufficient sample sizes. In real-world networks, hierarchical modularity, characterized by structured and nested connections, remains robust even with sparsely observed links. To address the challenge of limited link samples, we propose leveraging hierarchical modularity as a prior structure. We introduce complete-tree (CT) space, a discrete metric space with latent complete-tree structures, to formalize hierarchical modularity with an emphasis on its hierarchical permutation symmetry. Utilizing the group theory to quantize and compare permutation symmetries of different spaces, we prove that the CT space provides a significantly lower bound on sample complexity than the commonly used Euclidean space. We develop leaf matching, a data-efficient network embedding that maps nodes onto the CT space and conducts discrete optimization by reducing it to decentralized search. Experiments verify the data efficiency of CT space over other spaces. Moreover, leaf matching outperforms the state-of-the-art graph transformer in data-scarce scenarios while exhibiting excellent scalability. The code is available at: https://github.com/KevinGao7/LeafMatching.} }
Endnote
%0 Conference Paper %T Complete-Tree Space Favors Data-Efficient Link Prediction %A Chi Gao %A Lukai Li %A Yancheng Zhou %A Shangqi Guo %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-gao25g %I PMLR %P 18410--18429 %U https://proceedings.mlr.press/v267/gao25g.html %V 267 %X Link prediction is a fundamental problem for network-structured data. However, the prevalent research paradigm tends to assume abundant observed links, overlooking more challenging scenarios with a scarcity of observed links, which results in insufficient sample sizes. In real-world networks, hierarchical modularity, characterized by structured and nested connections, remains robust even with sparsely observed links. To address the challenge of limited link samples, we propose leveraging hierarchical modularity as a prior structure. We introduce complete-tree (CT) space, a discrete metric space with latent complete-tree structures, to formalize hierarchical modularity with an emphasis on its hierarchical permutation symmetry. Utilizing the group theory to quantize and compare permutation symmetries of different spaces, we prove that the CT space provides a significantly lower bound on sample complexity than the commonly used Euclidean space. We develop leaf matching, a data-efficient network embedding that maps nodes onto the CT space and conducts discrete optimization by reducing it to decentralized search. Experiments verify the data efficiency of CT space over other spaces. Moreover, leaf matching outperforms the state-of-the-art graph transformer in data-scarce scenarios while exhibiting excellent scalability. The code is available at: https://github.com/KevinGao7/LeafMatching.
APA
Gao, C., Li, L., Zhou, Y. & Guo, S.. (2025). Complete-Tree Space Favors Data-Efficient Link Prediction. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:18410-18429 Available from https://proceedings.mlr.press/v267/gao25g.html.

Related Material