Calibration techniques for node classification using graph neural networks on medical image data

Iris Vos, Ishaan Bhat, Birgitta Velthuis, Ynte Ruigrok, Hugo Kuijf
Medical Imaging with Deep Learning, PMLR 227:1211-1224, 2024.

Abstract

Miscalibration of deep neural networks (DNNs) can lead to unreliable predictions and hinder their use in clinical decision-making. This miscalibration is often caused by overconfident probability estimates. Calibration techniques such as model ensembles, regularization terms, and post-hoc scaling of the predictions can be employed to improve the calibration performance of DNNs. In contrast to DNNs, graph neural networks (GNNs) tend to exhibit underconfidence. In this study, we investigate the efficacy of calibration techniques developed for DNNs when applied to GNNs trained on medical image data, and compare the calibration performance of binary and multiclass node classification on a benchmark dataset and a medical image dataset. We find that post-hoc methods using Platt scaling or Temperature scaling, or methods that add a regularization term to the loss function during training are most effective to improve calibration. Our results further indicate that these calibration techniques are more effective for multiclass classification tasks compared to binary classification tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v227-vos24a, title = {Calibration techniques for node classification using graph neural networks on medical image data}, author = {Vos, Iris and Bhat, Ishaan and Velthuis, Birgitta and Ruigrok, Ynte and Kuijf, Hugo}, booktitle = {Medical Imaging with Deep Learning}, pages = {1211--1224}, year = {2024}, editor = {Oguz, Ipek and Noble, Jack and Li, Xiaoxiao and Styner, Martin and Baumgartner, Christian and Rusu, Mirabela and Heinmann, Tobias and Kontos, Despina and Landman, Bennett and Dawant, Benoit}, volume = {227}, series = {Proceedings of Machine Learning Research}, month = {10--12 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v227/vos24a/vos24a.pdf}, url = {https://proceedings.mlr.press/v227/vos24a.html}, abstract = {Miscalibration of deep neural networks (DNNs) can lead to unreliable predictions and hinder their use in clinical decision-making. This miscalibration is often caused by overconfident probability estimates. Calibration techniques such as model ensembles, regularization terms, and post-hoc scaling of the predictions can be employed to improve the calibration performance of DNNs. In contrast to DNNs, graph neural networks (GNNs) tend to exhibit underconfidence. In this study, we investigate the efficacy of calibration techniques developed for DNNs when applied to GNNs trained on medical image data, and compare the calibration performance of binary and multiclass node classification on a benchmark dataset and a medical image dataset. We find that post-hoc methods using Platt scaling or Temperature scaling, or methods that add a regularization term to the loss function during training are most effective to improve calibration. Our results further indicate that these calibration techniques are more effective for multiclass classification tasks compared to binary classification tasks.} }
Endnote
%0 Conference Paper %T Calibration techniques for node classification using graph neural networks on medical image data %A Iris Vos %A Ishaan Bhat %A Birgitta Velthuis %A Ynte Ruigrok %A Hugo Kuijf %B Medical Imaging with Deep Learning %C Proceedings of Machine Learning Research %D 2024 %E Ipek Oguz %E Jack Noble %E Xiaoxiao Li %E Martin Styner %E Christian Baumgartner %E Mirabela Rusu %E Tobias Heinmann %E Despina Kontos %E Bennett Landman %E Benoit Dawant %F pmlr-v227-vos24a %I PMLR %P 1211--1224 %U https://proceedings.mlr.press/v227/vos24a.html %V 227 %X Miscalibration of deep neural networks (DNNs) can lead to unreliable predictions and hinder their use in clinical decision-making. This miscalibration is often caused by overconfident probability estimates. Calibration techniques such as model ensembles, regularization terms, and post-hoc scaling of the predictions can be employed to improve the calibration performance of DNNs. In contrast to DNNs, graph neural networks (GNNs) tend to exhibit underconfidence. In this study, we investigate the efficacy of calibration techniques developed for DNNs when applied to GNNs trained on medical image data, and compare the calibration performance of binary and multiclass node classification on a benchmark dataset and a medical image dataset. We find that post-hoc methods using Platt scaling or Temperature scaling, or methods that add a regularization term to the loss function during training are most effective to improve calibration. Our results further indicate that these calibration techniques are more effective for multiclass classification tasks compared to binary classification tasks.
APA
Vos, I., Bhat, I., Velthuis, B., Ruigrok, Y. & Kuijf, H.. (2024). Calibration techniques for node classification using graph neural networks on medical image data. Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 227:1211-1224 Available from https://proceedings.mlr.press/v227/vos24a.html.

Related Material