Evaluating the Fairness of Deep Learning Uncertainty Estimates in Medical Image Analysis

Raghav Mehta, Changjian Shui, Tal Arbel
Medical Imaging with Deep Learning, PMLR 227:1453-1492, 2024.

Abstract

Although deep learning (DL) models have shown great success in many medical image analysis tasks, deployment of the resulting models into real clinical contexts requires: (1) that they exhibit robustness and fairness across different sub-populations, and (2) that the confidence in DL model predictions be accurately expressed in the form of uncertainties. Unfortunately, recent studies have indeed shown significant biases in DL models across demographic subgroups (e.g., race, sex, age) in the context of medical image analysis, indicating a lack of fairness in the models. Although several methods have been proposed in the ML literature to mitigate a lack of fairness in DL models, they focus entirely on the absolute performance between groups without considering their effect on uncertainty estimation. In this work, we present the first exploration of the effect of popular fairness models on overcoming biases across subgroups in medical image analysis in terms of bottom-line performance, and their effects on uncertainty quantification. We perform extensive experiments on three different clinically relevant tasks: (i) skin lesion classification, (ii) brain tumour segmentation, and (iii) Alzheimer’s disease clinical score regression. Our results indicate that popular ML methods, such as data-balancing and distributionally robust optimization, succeed in mitigating fairness issues in terms of the model performances for some of the tasks. However, this can come at the cost of poor uncertainty estimates associated with the model predictions. This tradeoff must be mitigated if fairness models are to be adopted in medical image analysis.

Cite this Paper


BibTeX
@InProceedings{pmlr-v227-mehta24a, title = {Evaluating the Fairness of Deep Learning Uncertainty Estimates in Medical Image Analysis}, author = {Mehta, Raghav and Shui, Changjian and Arbel, Tal}, booktitle = {Medical Imaging with Deep Learning}, pages = {1453--1492}, year = {2024}, editor = {Oguz, Ipek and Noble, Jack and Li, Xiaoxiao and Styner, Martin and Baumgartner, Christian and Rusu, Mirabela and Heinmann, Tobias and Kontos, Despina and Landman, Bennett and Dawant, Benoit}, volume = {227}, series = {Proceedings of Machine Learning Research}, month = {10--12 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v227/mehta24a/mehta24a.pdf}, url = {https://proceedings.mlr.press/v227/mehta24a.html}, abstract = {Although deep learning (DL) models have shown great success in many medical image analysis tasks, deployment of the resulting models into real clinical contexts requires: (1) that they exhibit robustness and fairness across different sub-populations, and (2) that the confidence in DL model predictions be accurately expressed in the form of uncertainties. Unfortunately, recent studies have indeed shown significant biases in DL models across demographic subgroups (e.g., race, sex, age) in the context of medical image analysis, indicating a lack of fairness in the models. Although several methods have been proposed in the ML literature to mitigate a lack of fairness in DL models, they focus entirely on the absolute performance between groups without considering their effect on uncertainty estimation. In this work, we present the first exploration of the effect of popular fairness models on overcoming biases across subgroups in medical image analysis in terms of bottom-line performance, and their effects on uncertainty quantification. We perform extensive experiments on three different clinically relevant tasks: (i) skin lesion classification, (ii) brain tumour segmentation, and (iii) Alzheimer’s disease clinical score regression. Our results indicate that popular ML methods, such as data-balancing and distributionally robust optimization, succeed in mitigating fairness issues in terms of the model performances for some of the tasks. However, this can come at the cost of poor uncertainty estimates associated with the model predictions. This tradeoff must be mitigated if fairness models are to be adopted in medical image analysis.} }
Endnote
%0 Conference Paper %T Evaluating the Fairness of Deep Learning Uncertainty Estimates in Medical Image Analysis %A Raghav Mehta %A Changjian Shui %A Tal Arbel %B Medical Imaging with Deep Learning %C Proceedings of Machine Learning Research %D 2024 %E Ipek Oguz %E Jack Noble %E Xiaoxiao Li %E Martin Styner %E Christian Baumgartner %E Mirabela Rusu %E Tobias Heinmann %E Despina Kontos %E Bennett Landman %E Benoit Dawant %F pmlr-v227-mehta24a %I PMLR %P 1453--1492 %U https://proceedings.mlr.press/v227/mehta24a.html %V 227 %X Although deep learning (DL) models have shown great success in many medical image analysis tasks, deployment of the resulting models into real clinical contexts requires: (1) that they exhibit robustness and fairness across different sub-populations, and (2) that the confidence in DL model predictions be accurately expressed in the form of uncertainties. Unfortunately, recent studies have indeed shown significant biases in DL models across demographic subgroups (e.g., race, sex, age) in the context of medical image analysis, indicating a lack of fairness in the models. Although several methods have been proposed in the ML literature to mitigate a lack of fairness in DL models, they focus entirely on the absolute performance between groups without considering their effect on uncertainty estimation. In this work, we present the first exploration of the effect of popular fairness models on overcoming biases across subgroups in medical image analysis in terms of bottom-line performance, and their effects on uncertainty quantification. We perform extensive experiments on three different clinically relevant tasks: (i) skin lesion classification, (ii) brain tumour segmentation, and (iii) Alzheimer’s disease clinical score regression. Our results indicate that popular ML methods, such as data-balancing and distributionally robust optimization, succeed in mitigating fairness issues in terms of the model performances for some of the tasks. However, this can come at the cost of poor uncertainty estimates associated with the model predictions. This tradeoff must be mitigated if fairness models are to be adopted in medical image analysis.
APA
Mehta, R., Shui, C. & Arbel, T.. (2024). Evaluating the Fairness of Deep Learning Uncertainty Estimates in Medical Image Analysis. Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 227:1453-1492 Available from https://proceedings.mlr.press/v227/mehta24a.html.

Related Material