How Fair are Medical Imaging Foundation Models?

Muhammad Osama Khan, Muhammad Muneeb Afzal, Shujaat Mirza, Yi Fang
Proceedings of the 3rd Machine Learning for Health Symposium, PMLR 225:217-231, 2023.

Abstract

While medical imaging foundation models have led to significant improvements across various tasks, the pivotal issue of subgroup fairness in these foundation models has remained largely unexplored. Our work bridges this research gap by presenting the first comprehensive study analyzing the subgroup fairness of six diverse foundation models, encompassing various pre-training methods, sources of pre-training data, and model architectures. In doing so, we discover a concerning trade-off: foundation models pre-trained on medical images achieve better overall performance but are consistently less fair than those pre-trained on natural images, with sometimes even worse fairness than baseline models trained from scratch. To mitigate these fairness disparities, we show that augmenting both the volume of pre-training data as well as the number of pre-training epochs, enhances subgroup fairness of medical imaging pre-trained models. Furthermore, to decouple the fairness bias from the pre-training and fine-tuning stages, we employ balanced datasets for fine-tuning. While fine-tuning on balanced datasets partially mitigates fairness issues, it is insufficient to completely eliminate the biases from the pre-training stage, prompting the need for careful design and evaluation of medical imaging foundation models. Our granular analysis reveals that medical imaging pre-trained models tend to favor majority racial subgroups (White, Asian) whereas natural imaging pre-trained models tend to favor minority racial subgroups (Black). Additionally, across all foundation models, we observe a consistent underperformance on the female patients cohort. As the community moves towards designing specialized foundation models for medical imaging, we hope our timely research provides crucial insights to help inform more equitable model development.

Cite this Paper


BibTeX
@InProceedings{pmlr-v225-khan23a, title = {How Fair are Medical Imaging Foundation Models?}, author = {Khan, Muhammad Osama and Afzal, Muhammad Muneeb and Mirza, Shujaat and Fang, Yi}, booktitle = {Proceedings of the 3rd Machine Learning for Health Symposium}, pages = {217--231}, year = {2023}, editor = {Hegselmann, Stefan and Parziale, Antonio and Shanmugam, Divya and Tang, Shengpu and Asiedu, Mercy Nyamewaa and Chang, Serina and Hartvigsen, Tom and Singh, Harvineet}, volume = {225}, series = {Proceedings of Machine Learning Research}, month = {10 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v225/khan23a/khan23a.pdf}, url = {https://proceedings.mlr.press/v225/khan23a.html}, abstract = {While medical imaging foundation models have led to significant improvements across various tasks, the pivotal issue of subgroup fairness in these foundation models has remained largely unexplored. Our work bridges this research gap by presenting the first comprehensive study analyzing the subgroup fairness of six diverse foundation models, encompassing various pre-training methods, sources of pre-training data, and model architectures. In doing so, we discover a concerning trade-off: foundation models pre-trained on medical images achieve better overall performance but are consistently less fair than those pre-trained on natural images, with sometimes even worse fairness than baseline models trained from scratch. To mitigate these fairness disparities, we show that augmenting both the volume of pre-training data as well as the number of pre-training epochs, enhances subgroup fairness of medical imaging pre-trained models. Furthermore, to decouple the fairness bias from the pre-training and fine-tuning stages, we employ balanced datasets for fine-tuning. While fine-tuning on balanced datasets partially mitigates fairness issues, it is insufficient to completely eliminate the biases from the pre-training stage, prompting the need for careful design and evaluation of medical imaging foundation models. Our granular analysis reveals that medical imaging pre-trained models tend to favor majority racial subgroups (White, Asian) whereas natural imaging pre-trained models tend to favor minority racial subgroups (Black). Additionally, across all foundation models, we observe a consistent underperformance on the female patients cohort. As the community moves towards designing specialized foundation models for medical imaging, we hope our timely research provides crucial insights to help inform more equitable model development.} }
Endnote
%0 Conference Paper %T How Fair are Medical Imaging Foundation Models? %A Muhammad Osama Khan %A Muhammad Muneeb Afzal %A Shujaat Mirza %A Yi Fang %B Proceedings of the 3rd Machine Learning for Health Symposium %C Proceedings of Machine Learning Research %D 2023 %E Stefan Hegselmann %E Antonio Parziale %E Divya Shanmugam %E Shengpu Tang %E Mercy Nyamewaa Asiedu %E Serina Chang %E Tom Hartvigsen %E Harvineet Singh %F pmlr-v225-khan23a %I PMLR %P 217--231 %U https://proceedings.mlr.press/v225/khan23a.html %V 225 %X While medical imaging foundation models have led to significant improvements across various tasks, the pivotal issue of subgroup fairness in these foundation models has remained largely unexplored. Our work bridges this research gap by presenting the first comprehensive study analyzing the subgroup fairness of six diverse foundation models, encompassing various pre-training methods, sources of pre-training data, and model architectures. In doing so, we discover a concerning trade-off: foundation models pre-trained on medical images achieve better overall performance but are consistently less fair than those pre-trained on natural images, with sometimes even worse fairness than baseline models trained from scratch. To mitigate these fairness disparities, we show that augmenting both the volume of pre-training data as well as the number of pre-training epochs, enhances subgroup fairness of medical imaging pre-trained models. Furthermore, to decouple the fairness bias from the pre-training and fine-tuning stages, we employ balanced datasets for fine-tuning. While fine-tuning on balanced datasets partially mitigates fairness issues, it is insufficient to completely eliminate the biases from the pre-training stage, prompting the need for careful design and evaluation of medical imaging foundation models. Our granular analysis reveals that medical imaging pre-trained models tend to favor majority racial subgroups (White, Asian) whereas natural imaging pre-trained models tend to favor minority racial subgroups (Black). Additionally, across all foundation models, we observe a consistent underperformance on the female patients cohort. As the community moves towards designing specialized foundation models for medical imaging, we hope our timely research provides crucial insights to help inform more equitable model development.
APA
Khan, M.O., Afzal, M.M., Mirza, S. & Fang, Y.. (2023). How Fair are Medical Imaging Foundation Models?. Proceedings of the 3rd Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 225:217-231 Available from https://proceedings.mlr.press/v225/khan23a.html.

Related Material