Implicit degree bias in the link prediction task

Rachith Aiyappa, Xin Wang, Munjung Kim, Ozgur Can Seckin, Yong-Yeol Ahn, Sadamori Kojaku
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:874-908, 2025.

Abstract

Link prediction—a task of distinguishing actual hidden edges from random unconnected node pairs—is one of the quintessential tasks in graph machine learning. Despite being widely accepted as a universal benchmark and a downstream task for representation learning, the link prediction benchmark’s validity has rarely been questioned. Here, we show that the common edge sampling procedure in the link prediction task has an implicit bias toward high-degree nodes. This produces a highly skewed evaluation that favors methods overly dependent on node degree. In fact a “null” link prediction method based solely on node degree can yield nearly optimal performance in this setting. We propose a degree-corrected link prediction benchmark that offers a more reasonable assessment and better aligns with the performance on the recommendation task. Finally, we demonstrate that the degree-corrected benchmark can more effectively train graph machine-learning models by reducing overfitting to node degrees and facilitating the learning of relevant structures in graphs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-aiyappa25a, title = {Implicit degree bias in the link prediction task}, author = {Aiyappa, Rachith and Wang, Xin and Kim, Munjung and Seckin, Ozgur Can and Ahn, Yong-Yeol and Kojaku, Sadamori}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {874--908}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/aiyappa25a/aiyappa25a.pdf}, url = {https://proceedings.mlr.press/v267/aiyappa25a.html}, abstract = {Link prediction—a task of distinguishing actual hidden edges from random unconnected node pairs—is one of the quintessential tasks in graph machine learning. Despite being widely accepted as a universal benchmark and a downstream task for representation learning, the link prediction benchmark’s validity has rarely been questioned. Here, we show that the common edge sampling procedure in the link prediction task has an implicit bias toward high-degree nodes. This produces a highly skewed evaluation that favors methods overly dependent on node degree. In fact a “null” link prediction method based solely on node degree can yield nearly optimal performance in this setting. We propose a degree-corrected link prediction benchmark that offers a more reasonable assessment and better aligns with the performance on the recommendation task. Finally, we demonstrate that the degree-corrected benchmark can more effectively train graph machine-learning models by reducing overfitting to node degrees and facilitating the learning of relevant structures in graphs.} }
Endnote
%0 Conference Paper %T Implicit degree bias in the link prediction task %A Rachith Aiyappa %A Xin Wang %A Munjung Kim %A Ozgur Can Seckin %A Yong-Yeol Ahn %A Sadamori Kojaku %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-aiyappa25a %I PMLR %P 874--908 %U https://proceedings.mlr.press/v267/aiyappa25a.html %V 267 %X Link prediction—a task of distinguishing actual hidden edges from random unconnected node pairs—is one of the quintessential tasks in graph machine learning. Despite being widely accepted as a universal benchmark and a downstream task for representation learning, the link prediction benchmark’s validity has rarely been questioned. Here, we show that the common edge sampling procedure in the link prediction task has an implicit bias toward high-degree nodes. This produces a highly skewed evaluation that favors methods overly dependent on node degree. In fact a “null” link prediction method based solely on node degree can yield nearly optimal performance in this setting. We propose a degree-corrected link prediction benchmark that offers a more reasonable assessment and better aligns with the performance on the recommendation task. Finally, we demonstrate that the degree-corrected benchmark can more effectively train graph machine-learning models by reducing overfitting to node degrees and facilitating the learning of relevant structures in graphs.
APA
Aiyappa, R., Wang, X., Kim, M., Seckin, O.C., Ahn, Y. & Kojaku, S.. (2025). Implicit degree bias in the link prediction task. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:874-908 Available from https://proceedings.mlr.press/v267/aiyappa25a.html.

Related Material