Reducing Reliance on Spurious Features in Medical Image Classification with Spatial Specificity

Khaled Saab; Sarah Hooper; Mayee Chen; Michael Zhang; Daniel Rubin; Christopher Re

Reducing Reliance on Spurious Features in Medical Image Classification with Spatial Specificity

Khaled Saab, Sarah Hooper, Mayee Chen, Michael Zhang, Daniel Rubin, Christopher Re

Proceedings of the 7th Machine Learning for Healthcare Conference, PMLR 182:760-784, 2022.

Abstract

A common failure mode of neural networks trained to classify abnormalities in medical images is their reliance on spurious features, which are features that are associated with the class label but are non-generalizable. In this work, we examine if supervising models with increased spatial specificity (i.e., information about the location of the abnormality) impacts model reliance on spurious features. We first propose a data model of spurious features and theoretically analyze the impact of increasing spatial specificity. We find that two properties of the data are impacted when we increase spatial specificity: the variance of the positively-labeled input pixels decreases and the mutual information between abnormal and spurious pixels decreases, both of which contribute to improved model robustness to spurious features. However, supervising models with greater spatial specificity incurs higher annotation costs, since training data must be labeled for the location of the abnormality, leading to a trade-off between annotation costs and model robustness to spurious features. We investigate this trade-off by varying the coarseness of the spatial specificity supplied and sweeping the quantity of training samples that have information about the abnormality location. Further, we assess if semi-supervised and contrastive learning methods improve the cost-robustness trade-off. We empirically examine the impact of supervising models with increased spatial specificity on two medical image datasets known to have spurious features: pneumothorax classification on chest x-rays and melanoma classification from dermoscopic images. We find that while models supervised with binary labels have near-random robust performance (robust AUROC of 0.46), increasing spatial specificity to bounding box detection and image segmentation achieves a robust AUROC of 0.72 and 0.82, respectively, on the pneumothorax classification task. We also observe this trend for the melanoma task, where segmentation models achieve a robust AUROC of 0.73, compared to worse than random performance for models trained with binary labels. Moreover, by leveraging semi-supervised and contrastive methods, models achieve a 5 point gain in robust AUROC when we have access to very few training samples.

Cite this Paper

BibTeX


@InProceedings{pmlr-v182-saab22a,
  title = 	 {Reducing Reliance on Spurious Features in Medical Image Classification with Spatial Specificity},
  author =       {Saab, Khaled and Hooper, Sarah and Chen, Mayee and Zhang, Michael and Rubin, Daniel and Re, Christopher},
  booktitle = 	 {Proceedings of the 7th Machine Learning for Healthcare Conference},
  pages = 	 {760--784},
  year = 	 {2022},
  editor = 	 {Lipton, Zachary and Ranganath, Rajesh and Sendak, Mark and Sjoding, Michael and Yeung, Serena},
  volume = 	 {182},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {05--06 Aug},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v182/saab22a/saab22a.pdf},
  url = 	 {https://proceedings.mlr.press/v182/saab22a.html},
  abstract = 	 {A common failure mode of neural networks trained to classify abnormalities in medical images is their reliance on spurious features, which are features that are associated with the class label but are non-generalizable. In this work, we examine if supervising models with increased spatial specificity (i.e., information about the location of the abnormality) impacts model reliance on spurious features. We first propose a data model of spurious features and theoretically analyze the impact of increasing spatial specificity. We find that two properties of the data are impacted when we increase spatial specificity: the variance of the positively-labeled input pixels decreases and the mutual information between abnormal and spurious pixels decreases, both of which contribute to improved model robustness to spurious features. However, supervising models with greater spatial specificity incurs higher annotation costs, since training data must be labeled for the location of the abnormality, leading to a trade-off between annotation costs and model robustness to spurious features. We investigate this trade-off by varying the coarseness of the spatial specificity supplied and sweeping the quantity of training samples that have information about the abnormality location. Further, we assess if semi-supervised and contrastive learning methods improve the cost-robustness trade-off. We empirically examine the impact of supervising models with increased spatial specificity on two medical image datasets known to have spurious features: pneumothorax classification on chest x-rays and melanoma classification from dermoscopic images. We find that while models supervised with binary labels have near-random robust performance (robust AUROC of 0.46), increasing spatial specificity to bounding box detection and image segmentation achieves a robust AUROC of 0.72 and 0.82, respectively, on the pneumothorax classification task. We also observe this trend for the melanoma task, where segmentation models achieve a robust AUROC of 0.73, compared to worse than random performance for models trained with binary labels. Moreover, by leveraging semi-supervised and contrastive methods, models achieve a 5 point gain in robust AUROC when we have access to very few training samples.}
}

Endnote

%0 Conference Paper
%T Reducing Reliance on Spurious Features in Medical Image Classification with Spatial Specificity
%A Khaled Saab
%A Sarah Hooper
%A Mayee Chen
%A Michael Zhang
%A Daniel Rubin
%A Christopher Re
%B Proceedings of the 7th Machine Learning for Healthcare Conference
%C Proceedings of Machine Learning Research
%D 2022
%E Zachary Lipton
%E Rajesh Ranganath
%E Mark Sendak
%E Michael Sjoding
%E Serena Yeung	
%F pmlr-v182-saab22a
%I PMLR
%P 760--784
%U https://proceedings.mlr.press/v182/saab22a.html
%V 182
%X A common failure mode of neural networks trained to classify abnormalities in medical images is their reliance on spurious features, which are features that are associated with the class label but are non-generalizable. In this work, we examine if supervising models with increased spatial specificity (i.e., information about the location of the abnormality) impacts model reliance on spurious features. We first propose a data model of spurious features and theoretically analyze the impact of increasing spatial specificity. We find that two properties of the data are impacted when we increase spatial specificity: the variance of the positively-labeled input pixels decreases and the mutual information between abnormal and spurious pixels decreases, both of which contribute to improved model robustness to spurious features. However, supervising models with greater spatial specificity incurs higher annotation costs, since training data must be labeled for the location of the abnormality, leading to a trade-off between annotation costs and model robustness to spurious features. We investigate this trade-off by varying the coarseness of the spatial specificity supplied and sweeping the quantity of training samples that have information about the abnormality location. Further, we assess if semi-supervised and contrastive learning methods improve the cost-robustness trade-off. We empirically examine the impact of supervising models with increased spatial specificity on two medical image datasets known to have spurious features: pneumothorax classification on chest x-rays and melanoma classification from dermoscopic images. We find that while models supervised with binary labels have near-random robust performance (robust AUROC of 0.46), increasing spatial specificity to bounding box detection and image segmentation achieves a robust AUROC of 0.72 and 0.82, respectively, on the pneumothorax classification task. We also observe this trend for the melanoma task, where segmentation models achieve a robust AUROC of 0.73, compared to worse than random performance for models trained with binary labels. Moreover, by leveraging semi-supervised and contrastive methods, models achieve a 5 point gain in robust AUROC when we have access to very few training samples.

APA


Saab, K., Hooper, S., Chen, M., Zhang, M., Rubin, D. & Re, C.. (2022). Reducing Reliance on Spurious Features in Medical Image Classification with Spatial Specificity. Proceedings of the 7th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 182:760-784 Available from https://proceedings.mlr.press/v182/saab22a.html.

Related Material

Download PDF