CheXpert++: Approximating the CheXpert Labeler for Speed, Differentiability, and Probabilistic Output

Matthew B.A. McDermott, Tzu Ming Harry Hsu, Wei-Hung Weng, Marzyeh Ghassemi, Peter Szolovits
Proceedings of the 5th Machine Learning for Healthcare Conference, PMLR 126:913-927, 2020.

Abstract

It is often infeasible or impossible to obtain ground truth labels for medical data. To circumvent this, one may build rule-based or other expert-knowledge driven labelers to ingest data and yield silver labels absent any ground-truth training data. One popular such labeler is CheXpert (Irvin et al., 2019), a labeler that produces diagnostic labels for chest X-ray radiology reports. CheXpert is very useful, but is relatively computationally slow, especially when integrated with end-to-end neural pipelines, is non-differentiable so can’t be used in any applications that require gradients to flow through the labeler, and does not yield probabilistic outputs, which limits our ability to improve the quality of the silver labeler through techniques such as active learning. In this work, we solve all three of these problems with CheXpert++, a BERT-based, high-fidelity approximation to CheXpert. CheXpert++ achieves 99.81% parity with CheXpert, which means it can be reliably used as a drop-in replacement for CheXpert, all while being significantly faster, fully differentiable, and probabilistic in output. Error analysis of CheXpert++ also demonstrates that CheXpert++ has a tendency to actually correct errors in the CheXpert labels, with CheXpert++ labels being more often preferred by a clinician over CheXpert labels (when they disagree) on all but one disease task. To further demonstrate the utility of these advantages in this model, we conduct a proof-of-concept active learning study, demonstrating we can improve accuracy on an expert labeled random subset of report sentences by approximately 8% over raw, unaltered CheXpert by using one-iteration of active-learning inspired re-training. These findings suggest that simple techniques in co-learning and active learning can yield high-quality labelers under minimal, and controllable human labeling demands.

Cite this Paper


BibTeX
@InProceedings{pmlr-v126-mcdermott20a, title = {CheXpert++: Approximating the CheXpert Labeler for Speed, Differentiability, and Probabilistic Output}, author = {McDermott, Matthew B.A. and Hsu, Tzu Ming Harry and Weng, Wei-Hung and Ghassemi, Marzyeh and Szolovits, Peter}, booktitle = {Proceedings of the 5th Machine Learning for Healthcare Conference}, pages = {913--927}, year = {2020}, editor = {Doshi-Velez, Finale and Fackler, Jim and Jung, Ken and Kale, David and Ranganath, Rajesh and Wallace, Byron and Wiens, Jenna}, volume = {126}, series = {Proceedings of Machine Learning Research}, month = {07--08 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v126/mcdermott20a/mcdermott20a.pdf}, url = {https://proceedings.mlr.press/v126/mcdermott20a.html}, abstract = {It is often infeasible or impossible to obtain ground truth labels for medical data. To circumvent this, one may build rule-based or other expert-knowledge driven labelers to ingest data and yield silver labels absent any ground-truth training data. One popular such labeler is CheXpert (Irvin et al., 2019), a labeler that produces diagnostic labels for chest X-ray radiology reports. CheXpert is very useful, but is relatively computationally slow, especially when integrated with end-to-end neural pipelines, is non-differentiable so can’t be used in any applications that require gradients to flow through the labeler, and does not yield probabilistic outputs, which limits our ability to improve the quality of the silver labeler through techniques such as active learning. In this work, we solve all three of these problems with CheXpert++, a BERT-based, high-fidelity approximation to CheXpert. CheXpert++ achieves 99.81% parity with CheXpert, which means it can be reliably used as a drop-in replacement for CheXpert, all while being significantly faster, fully differentiable, and probabilistic in output. Error analysis of CheXpert++ also demonstrates that CheXpert++ has a tendency to actually correct errors in the CheXpert labels, with CheXpert++ labels being more often preferred by a clinician over CheXpert labels (when they disagree) on all but one disease task. To further demonstrate the utility of these advantages in this model, we conduct a proof-of-concept active learning study, demonstrating we can improve accuracy on an expert labeled random subset of report sentences by approximately 8% over raw, unaltered CheXpert by using one-iteration of active-learning inspired re-training. These findings suggest that simple techniques in co-learning and active learning can yield high-quality labelers under minimal, and controllable human labeling demands.} }
Endnote
%0 Conference Paper %T CheXpert++: Approximating the CheXpert Labeler for Speed, Differentiability, and Probabilistic Output %A Matthew B.A. McDermott %A Tzu Ming Harry Hsu %A Wei-Hung Weng %A Marzyeh Ghassemi %A Peter Szolovits %B Proceedings of the 5th Machine Learning for Healthcare Conference %C Proceedings of Machine Learning Research %D 2020 %E Finale Doshi-Velez %E Jim Fackler %E Ken Jung %E David Kale %E Rajesh Ranganath %E Byron Wallace %E Jenna Wiens %F pmlr-v126-mcdermott20a %I PMLR %P 913--927 %U https://proceedings.mlr.press/v126/mcdermott20a.html %V 126 %X It is often infeasible or impossible to obtain ground truth labels for medical data. To circumvent this, one may build rule-based or other expert-knowledge driven labelers to ingest data and yield silver labels absent any ground-truth training data. One popular such labeler is CheXpert (Irvin et al., 2019), a labeler that produces diagnostic labels for chest X-ray radiology reports. CheXpert is very useful, but is relatively computationally slow, especially when integrated with end-to-end neural pipelines, is non-differentiable so can’t be used in any applications that require gradients to flow through the labeler, and does not yield probabilistic outputs, which limits our ability to improve the quality of the silver labeler through techniques such as active learning. In this work, we solve all three of these problems with CheXpert++, a BERT-based, high-fidelity approximation to CheXpert. CheXpert++ achieves 99.81% parity with CheXpert, which means it can be reliably used as a drop-in replacement for CheXpert, all while being significantly faster, fully differentiable, and probabilistic in output. Error analysis of CheXpert++ also demonstrates that CheXpert++ has a tendency to actually correct errors in the CheXpert labels, with CheXpert++ labels being more often preferred by a clinician over CheXpert labels (when they disagree) on all but one disease task. To further demonstrate the utility of these advantages in this model, we conduct a proof-of-concept active learning study, demonstrating we can improve accuracy on an expert labeled random subset of report sentences by approximately 8% over raw, unaltered CheXpert by using one-iteration of active-learning inspired re-training. These findings suggest that simple techniques in co-learning and active learning can yield high-quality labelers under minimal, and controllable human labeling demands.
APA
McDermott, M.B., Hsu, T.M.H., Weng, W., Ghassemi, M. & Szolovits, P.. (2020). CheXpert++: Approximating the CheXpert Labeler for Speed, Differentiability, and Probabilistic Output. Proceedings of the 5th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 126:913-927 Available from https://proceedings.mlr.press/v126/mcdermott20a.html.

Related Material