Automated ranking of chest x-ray radiological finding severity in a binary label setting

Matthew Macpherson, Keerthini Muthuswamy, Ashik Amlani, Vicky Goh, Giovanni Montana
Proceedings of The 7nd International Conference on Medical Imaging with Deep Learning, PMLR 250:949-963, 2024.

Abstract

Machine learning has demonstrated the ability to match or exceed human performance in detecting a range of abnormalities in chest x-rays. However, current models largely operate within a binary classification paradigm with findings either present or absent using fixed decision thresholds, whereas many clinical findings can be more usefully described on a scale of severity which a skilled radiologist will incorporate into a more nuanced report. This limitation is due, in part, to the difficulty and expense of manually annotating fine-grained labels for training and test images versus the relative ease of automatically extracting binary labels from the associated free text reports using NLP algorithms. In this paper we examine the ability of models trained with only binary training data to give useful abnormality severity information from their raw outputs. We assess performance on a ranking task using manually ranked test sets for each of five findings: cardiomegaly, consolidation, paratracheal hilar changes, pleural effusion and subcutaneous emphysema. We find the raw model output agrees with human-assessed severity ranking with Spearmanś rank coefficients between 0.563 - 0.848. Using patient age as an additional radiological finding with full ground truth ranking available, we go on to compare a binary classifier output against a fully supervised RankNet model, quantifying the reduction in training data required in the fully supervised setting for equivalent performance. We show that model performance is improved using a semi-supervised approach supplementing a smaller set of fully supervised images with a larger set of binary labelled images.

Cite this Paper


BibTeX
@InProceedings{pmlr-v250-macpherson24a, title = {Automated ranking of chest x-ray radiological finding severity in a binary label setting}, author = {Macpherson, Matthew and Muthuswamy, Keerthini and Amlani, Ashik and Goh, Vicky and Montana, Giovanni}, booktitle = {Proceedings of The 7nd International Conference on Medical Imaging with Deep Learning}, pages = {949--963}, year = {2024}, editor = {Burgos, Ninon and Petitjean, Caroline and Vakalopoulou, Maria and Christodoulidis, Stergios and Coupe, Pierrick and Delingette, Hervé and Lartizien, Carole and Mateus, Diana}, volume = {250}, series = {Proceedings of Machine Learning Research}, month = {03--05 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v250/main/assets/macpherson24a/macpherson24a.pdf}, url = {https://proceedings.mlr.press/v250/macpherson24a.html}, abstract = {Machine learning has demonstrated the ability to match or exceed human performance in detecting a range of abnormalities in chest x-rays. However, current models largely operate within a binary classification paradigm with findings either present or absent using fixed decision thresholds, whereas many clinical findings can be more usefully described on a scale of severity which a skilled radiologist will incorporate into a more nuanced report. This limitation is due, in part, to the difficulty and expense of manually annotating fine-grained labels for training and test images versus the relative ease of automatically extracting binary labels from the associated free text reports using NLP algorithms. In this paper we examine the ability of models trained with only binary training data to give useful abnormality severity information from their raw outputs. We assess performance on a ranking task using manually ranked test sets for each of five findings: cardiomegaly, consolidation, paratracheal hilar changes, pleural effusion and subcutaneous emphysema. We find the raw model output agrees with human-assessed severity ranking with Spearmanś rank coefficients between 0.563 - 0.848. Using patient age as an additional radiological finding with full ground truth ranking available, we go on to compare a binary classifier output against a fully supervised RankNet model, quantifying the reduction in training data required in the fully supervised setting for equivalent performance. We show that model performance is improved using a semi-supervised approach supplementing a smaller set of fully supervised images with a larger set of binary labelled images.} }
Endnote
%0 Conference Paper %T Automated ranking of chest x-ray radiological finding severity in a binary label setting %A Matthew Macpherson %A Keerthini Muthuswamy %A Ashik Amlani %A Vicky Goh %A Giovanni Montana %B Proceedings of The 7nd International Conference on Medical Imaging with Deep Learning %C Proceedings of Machine Learning Research %D 2024 %E Ninon Burgos %E Caroline Petitjean %E Maria Vakalopoulou %E Stergios Christodoulidis %E Pierrick Coupe %E Hervé Delingette %E Carole Lartizien %E Diana Mateus %F pmlr-v250-macpherson24a %I PMLR %P 949--963 %U https://proceedings.mlr.press/v250/macpherson24a.html %V 250 %X Machine learning has demonstrated the ability to match or exceed human performance in detecting a range of abnormalities in chest x-rays. However, current models largely operate within a binary classification paradigm with findings either present or absent using fixed decision thresholds, whereas many clinical findings can be more usefully described on a scale of severity which a skilled radiologist will incorporate into a more nuanced report. This limitation is due, in part, to the difficulty and expense of manually annotating fine-grained labels for training and test images versus the relative ease of automatically extracting binary labels from the associated free text reports using NLP algorithms. In this paper we examine the ability of models trained with only binary training data to give useful abnormality severity information from their raw outputs. We assess performance on a ranking task using manually ranked test sets for each of five findings: cardiomegaly, consolidation, paratracheal hilar changes, pleural effusion and subcutaneous emphysema. We find the raw model output agrees with human-assessed severity ranking with Spearmanś rank coefficients between 0.563 - 0.848. Using patient age as an additional radiological finding with full ground truth ranking available, we go on to compare a binary classifier output against a fully supervised RankNet model, quantifying the reduction in training data required in the fully supervised setting for equivalent performance. We show that model performance is improved using a semi-supervised approach supplementing a smaller set of fully supervised images with a larger set of binary labelled images.
APA
Macpherson, M., Muthuswamy, K., Amlani, A., Goh, V. & Montana, G.. (2024). Automated ranking of chest x-ray radiological finding severity in a binary label setting. Proceedings of The 7nd International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 250:949-963 Available from https://proceedings.mlr.press/v250/macpherson24a.html.

Related Material