CheXbreak: Misclassification Identification for Deep Learning Models Interpreting Chest X-rays

Emma Chen, Andy Kim, Rayan Krishnan, Jin Long, Andrew Y. Ng, Pranav Rajpurkar
Proceedings of the 6th Machine Learning for Healthcare Conference, PMLR 149:103-125, 2021.

Abstract

A major obstacle to the integration of deep learning models for chest x-ray interpretation into clinical settings is the lack of understanding of their failure modes. In this work, we first investigate whether there are patient subgroups that chest x-ray models are likely to misclassify. We find that patient age and the radiographic finding of lung lesion, pneumothorax or support devices are statistically relevant features for predicting misclassification for some chest x-ray models. Second, we develop misclassification predictors on chest x-ray models using their outputs and clinical features. We find that our best performing misclassification identifier achieves an AUROC close to 0.9 for most diseases. Third, employing our misclassification identifiers, we develop a corrective algorithm to selectively flip model predictions that have high likelihood of misclassification at inference time. We observe F1 improvement on the prediction of Consolidation (0.008 [95% CI 0.005, 0.010]) and Edema (0.003, [95% CI 0.001, 0.006]). By carrying out our investigation on ten distinct and high- performing chest x-ray models, we are able to derive insights across model architectures and offer a generalizable framework applicable to other medical imaging tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v149-chen21a, title = {CheXbreak: Misclassification Identification for Deep Learning Models Interpreting Chest X-rays}, author = {Chen, Emma and Kim, Andy and Krishnan, Rayan and Long, Jin and Ng, Andrew Y. and Rajpurkar, Pranav}, booktitle = {Proceedings of the 6th Machine Learning for Healthcare Conference}, pages = {103--125}, year = {2021}, editor = {Jung, Ken and Yeung, Serena and Sendak, Mark and Sjoding, Michael and Ranganath, Rajesh}, volume = {149}, series = {Proceedings of Machine Learning Research}, month = {06--07 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v149/chen21a/chen21a.pdf}, url = {https://proceedings.mlr.press/v149/chen21a.html}, abstract = {A major obstacle to the integration of deep learning models for chest x-ray interpretation into clinical settings is the lack of understanding of their failure modes. In this work, we first investigate whether there are patient subgroups that chest x-ray models are likely to misclassify. We find that patient age and the radiographic finding of lung lesion, pneumothorax or support devices are statistically relevant features for predicting misclassification for some chest x-ray models. Second, we develop misclassification predictors on chest x-ray models using their outputs and clinical features. We find that our best performing misclassification identifier achieves an AUROC close to 0.9 for most diseases. Third, employing our misclassification identifiers, we develop a corrective algorithm to selectively flip model predictions that have high likelihood of misclassification at inference time. We observe F1 improvement on the prediction of Consolidation (0.008 [95% CI 0.005, 0.010]) and Edema (0.003, [95% CI 0.001, 0.006]). By carrying out our investigation on ten distinct and high- performing chest x-ray models, we are able to derive insights across model architectures and offer a generalizable framework applicable to other medical imaging tasks.} }
Endnote
%0 Conference Paper %T CheXbreak: Misclassification Identification for Deep Learning Models Interpreting Chest X-rays %A Emma Chen %A Andy Kim %A Rayan Krishnan %A Jin Long %A Andrew Y. Ng %A Pranav Rajpurkar %B Proceedings of the 6th Machine Learning for Healthcare Conference %C Proceedings of Machine Learning Research %D 2021 %E Ken Jung %E Serena Yeung %E Mark Sendak %E Michael Sjoding %E Rajesh Ranganath %F pmlr-v149-chen21a %I PMLR %P 103--125 %U https://proceedings.mlr.press/v149/chen21a.html %V 149 %X A major obstacle to the integration of deep learning models for chest x-ray interpretation into clinical settings is the lack of understanding of their failure modes. In this work, we first investigate whether there are patient subgroups that chest x-ray models are likely to misclassify. We find that patient age and the radiographic finding of lung lesion, pneumothorax or support devices are statistically relevant features for predicting misclassification for some chest x-ray models. Second, we develop misclassification predictors on chest x-ray models using their outputs and clinical features. We find that our best performing misclassification identifier achieves an AUROC close to 0.9 for most diseases. Third, employing our misclassification identifiers, we develop a corrective algorithm to selectively flip model predictions that have high likelihood of misclassification at inference time. We observe F1 improvement on the prediction of Consolidation (0.008 [95% CI 0.005, 0.010]) and Edema (0.003, [95% CI 0.001, 0.006]). By carrying out our investigation on ten distinct and high- performing chest x-ray models, we are able to derive insights across model architectures and offer a generalizable framework applicable to other medical imaging tasks.
APA
Chen, E., Kim, A., Krishnan, R., Long, J., Ng, A.Y. & Rajpurkar, P.. (2021). CheXbreak: Misclassification Identification for Deep Learning Models Interpreting Chest X-rays. Proceedings of the 6th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 149:103-125 Available from https://proceedings.mlr.press/v149/chen21a.html.

Related Material