Can You Trust Your Model? Constructing Uncertainty Approximations Guaranteeing Validity of Glioma Segmentation Explanations

Tianyi Ren, Daniel Low, Rachel Xiang, Pittra Jaengprajak, Juampablo Heras Rivera, Riley Olson, Jacob Ruzevick, Mehmet Kurt
Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, PMLR 315:1407-1421, 2026.

Abstract

Deep learning models have been successfully applied to glioma segmentation from multi-contrast MRI, yet model reasoning is difficult to validate clinically. Prior work used contrast-level Shapley values to explain how individual MRI sequences contribute to segmentation performance, and showed that alignment between these explanations and protocol-derived contrast rankings is associated with improved model performance. However, a single trained model may not reflect the optimal population-level model, and naive Deep Ensemble uncertainty estimates provide no guarantees that the true optimal explanation lies within their intervals. In this work, we construct statistically valid uncertainty intervals for contrast-level Shapley values in glioma segmentation. Using a U-Net trained on the BraTS 2024 GoAT dataset, we compute Shapley values for each MRI contrast and tumor sub-region, form naive uncertainty estimations from cross-validation, and then apply a frequentist framework based on uniform convergence to define a confidence set of plausibly optimal models. By optimizing mixed objectives that trade off empirical loss and Shapley value, we approximate the Pareto frontier and obtain lower and upper bounds on the optimal explanation. We compare these intervals with clinically derived consensus and protocol rankings. Our results demonstrate that naive uncertainty estimations can lead to inconclusive or misleading conclusions about clinical alignment, whereas frequentist intervals provide principled guarantees on coverage of the optimal explanation and show moderate correlation with annotator consensus, enabling more reliable validation of model explanations against established clinical reasoning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v315-ren26a, title = {Can You Trust Your Model? Constructing Uncertainty Approximations Guaranteeing Validity of Glioma Segmentation Explanations}, author = {Ren, Tianyi and Low, Daniel and Xiang, Rachel and Jaengprajak, Pittra and Rivera, Juampablo Heras and Olson, Riley and Ruzevick, Jacob and Kurt, Mehmet}, booktitle = {Proceedings of The 9th International Conference on Medical Imaging with Deep Learning}, pages = {1407--1421}, year = {2026}, editor = {Huo, Yuankai and Gao, Mingchen and Kuo, Chang-Fu and Jin, Yueming and Deng, Ruining}, volume = {315}, series = {Proceedings of Machine Learning Research}, month = {08--10 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v315/main/assets/ren26a/ren26a.pdf}, url = {https://proceedings.mlr.press/v315/ren26a.html}, abstract = {Deep learning models have been successfully applied to glioma segmentation from multi-contrast MRI, yet model reasoning is difficult to validate clinically. Prior work used contrast-level Shapley values to explain how individual MRI sequences contribute to segmentation performance, and showed that alignment between these explanations and protocol-derived contrast rankings is associated with improved model performance. However, a single trained model may not reflect the optimal population-level model, and naive Deep Ensemble uncertainty estimates provide no guarantees that the true optimal explanation lies within their intervals. In this work, we construct statistically valid uncertainty intervals for contrast-level Shapley values in glioma segmentation. Using a U-Net trained on the BraTS 2024 GoAT dataset, we compute Shapley values for each MRI contrast and tumor sub-region, form naive uncertainty estimations from cross-validation, and then apply a frequentist framework based on uniform convergence to define a confidence set of plausibly optimal models. By optimizing mixed objectives that trade off empirical loss and Shapley value, we approximate the Pareto frontier and obtain lower and upper bounds on the optimal explanation. We compare these intervals with clinically derived consensus and protocol rankings. Our results demonstrate that naive uncertainty estimations can lead to inconclusive or misleading conclusions about clinical alignment, whereas frequentist intervals provide principled guarantees on coverage of the optimal explanation and show moderate correlation with annotator consensus, enabling more reliable validation of model explanations against established clinical reasoning.} }
Endnote
%0 Conference Paper %T Can You Trust Your Model? Constructing Uncertainty Approximations Guaranteeing Validity of Glioma Segmentation Explanations %A Tianyi Ren %A Daniel Low %A Rachel Xiang %A Pittra Jaengprajak %A Juampablo Heras Rivera %A Riley Olson %A Jacob Ruzevick %A Mehmet Kurt %B Proceedings of The 9th International Conference on Medical Imaging with Deep Learning %C Proceedings of Machine Learning Research %D 2026 %E Yuankai Huo %E Mingchen Gao %E Chang-Fu Kuo %E Yueming Jin %E Ruining Deng %F pmlr-v315-ren26a %I PMLR %P 1407--1421 %U https://proceedings.mlr.press/v315/ren26a.html %V 315 %X Deep learning models have been successfully applied to glioma segmentation from multi-contrast MRI, yet model reasoning is difficult to validate clinically. Prior work used contrast-level Shapley values to explain how individual MRI sequences contribute to segmentation performance, and showed that alignment between these explanations and protocol-derived contrast rankings is associated with improved model performance. However, a single trained model may not reflect the optimal population-level model, and naive Deep Ensemble uncertainty estimates provide no guarantees that the true optimal explanation lies within their intervals. In this work, we construct statistically valid uncertainty intervals for contrast-level Shapley values in glioma segmentation. Using a U-Net trained on the BraTS 2024 GoAT dataset, we compute Shapley values for each MRI contrast and tumor sub-region, form naive uncertainty estimations from cross-validation, and then apply a frequentist framework based on uniform convergence to define a confidence set of plausibly optimal models. By optimizing mixed objectives that trade off empirical loss and Shapley value, we approximate the Pareto frontier and obtain lower and upper bounds on the optimal explanation. We compare these intervals with clinically derived consensus and protocol rankings. Our results demonstrate that naive uncertainty estimations can lead to inconclusive or misleading conclusions about clinical alignment, whereas frequentist intervals provide principled guarantees on coverage of the optimal explanation and show moderate correlation with annotator consensus, enabling more reliable validation of model explanations against established clinical reasoning.
APA
Ren, T., Low, D., Xiang, R., Jaengprajak, P., Rivera, J.H., Olson, R., Ruzevick, J. & Kurt, M.. (2026). Can You Trust Your Model? Constructing Uncertainty Approximations Guaranteeing Validity of Glioma Segmentation Explanations. Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 315:1407-1421 Available from https://proceedings.mlr.press/v315/ren26a.html.

Related Material