Distilling Influences to Mitigate Prediction Churn in Graph Neural Networks

Andreas Roth, Thomas Liebig
Proceedings of the 15th Asian Conference on Machine Learning, PMLR 222:1151-1166, 2024.

Abstract

Models with similar performances exhibit significant disagreement in the predictions of individual samples, referred to as prediction churn. Our work explores this phenomenon in graph neural networks by investigating differences between models differing only in their initializations in their utilized features for predictions. We propose a novel metric called Influence Difference (ID) to quantify the variation in reasons used by nodes across models by comparing their influence distribution. Additionally, we consider the differences between nodes with a stable and an unstable prediction, positing that both equally utilize different reasons and thus provide a meaningful gradient signal to closely match two models even when the predictions for nodes are similar. Based on our analysis, we propose to minimize this ID in Knowledge Distillation, a domain where a new model should closely match an established one. As an efficient approximation, we introduce DropDistillation (DD) that matches the output for a graph perturbed by edge deletions. Our empirical evaluation of six benchmark datasets for node classification validates the differences in utilized features. DD outperforms previous methods regarding prediction stability and overall performance in all considered Knowledge Distillation experiments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v222-roth24a, title = {Distilling Influences to Mitigate Prediction Churn in Graph Neural Networks}, author = {Roth, Andreas and Liebig, Thomas}, booktitle = {Proceedings of the 15th Asian Conference on Machine Learning}, pages = {1151--1166}, year = {2024}, editor = {Yanıkoğlu, Berrin and Buntine, Wray}, volume = {222}, series = {Proceedings of Machine Learning Research}, month = {11--14 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v222/roth24a/roth24a.pdf}, url = {https://proceedings.mlr.press/v222/roth24a.html}, abstract = {Models with similar performances exhibit significant disagreement in the predictions of individual samples, referred to as prediction churn. Our work explores this phenomenon in graph neural networks by investigating differences between models differing only in their initializations in their utilized features for predictions. We propose a novel metric called Influence Difference (ID) to quantify the variation in reasons used by nodes across models by comparing their influence distribution. Additionally, we consider the differences between nodes with a stable and an unstable prediction, positing that both equally utilize different reasons and thus provide a meaningful gradient signal to closely match two models even when the predictions for nodes are similar. Based on our analysis, we propose to minimize this ID in Knowledge Distillation, a domain where a new model should closely match an established one. As an efficient approximation, we introduce DropDistillation (DD) that matches the output for a graph perturbed by edge deletions. Our empirical evaluation of six benchmark datasets for node classification validates the differences in utilized features. DD outperforms previous methods regarding prediction stability and overall performance in all considered Knowledge Distillation experiments.} }
Endnote
%0 Conference Paper %T Distilling Influences to Mitigate Prediction Churn in Graph Neural Networks %A Andreas Roth %A Thomas Liebig %B Proceedings of the 15th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Berrin Yanıkoğlu %E Wray Buntine %F pmlr-v222-roth24a %I PMLR %P 1151--1166 %U https://proceedings.mlr.press/v222/roth24a.html %V 222 %X Models with similar performances exhibit significant disagreement in the predictions of individual samples, referred to as prediction churn. Our work explores this phenomenon in graph neural networks by investigating differences between models differing only in their initializations in their utilized features for predictions. We propose a novel metric called Influence Difference (ID) to quantify the variation in reasons used by nodes across models by comparing their influence distribution. Additionally, we consider the differences between nodes with a stable and an unstable prediction, positing that both equally utilize different reasons and thus provide a meaningful gradient signal to closely match two models even when the predictions for nodes are similar. Based on our analysis, we propose to minimize this ID in Knowledge Distillation, a domain where a new model should closely match an established one. As an efficient approximation, we introduce DropDistillation (DD) that matches the output for a graph perturbed by edge deletions. Our empirical evaluation of six benchmark datasets for node classification validates the differences in utilized features. DD outperforms previous methods regarding prediction stability and overall performance in all considered Knowledge Distillation experiments.
APA
Roth, A. & Liebig, T.. (2024). Distilling Influences to Mitigate Prediction Churn in Graph Neural Networks. Proceedings of the 15th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 222:1151-1166 Available from https://proceedings.mlr.press/v222/roth24a.html.

Related Material