Radiology Reports Improve Visual Representations Learned from Radiographs

Haoxu Huang, Samyak Rawlekar, Sumit Chopra, Cem M Deniz
Medical Imaging with Deep Learning, PMLR 227:1385-1405, 2024.

Abstract

Although human’s ability to visually understand the structure of the World plays a crucial role in perceiving the World and making appropriate decisions, human perception does not solely rely on vision but amalgamates the information from acoustic, verbal, and visual stimuli. An active area of research has been revolving around designing an efficient framework that adapts to multiple modalities and ideally improves the performance of existing tasks. While numerous frameworks have proved effective on natural datasets like ImageNet, a limited number of studies have been carried out in the biomedical domain. In this work, we extend the available frameworks for natural data to biomedical data by leveraging the abundant, unstructured multi-modal data available as radiology images and reports. We attempt to answer the question, ”For multi-modal learning, self-supervised learning and joint learning using both learning strategies, which one improves the visual representation for downstream chest radiographs classification tasks the most?”. Our experiments indicated that in limited labeled data settings with 1% and 10% labeled data, the joint learning with multi-modal and self-supervised models outperforms self-supervised learning and is at par with multi-modal learning. Additionally, we found that multi-modal learning is generally more robust on out-of-distribution datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v227-huang24a, title = {Radiology Reports Improve Visual Representations Learned from Radiographs}, author = {Huang, Haoxu and Rawlekar, Samyak and Chopra, Sumit and Deniz, Cem M}, booktitle = {Medical Imaging with Deep Learning}, pages = {1385--1405}, year = {2024}, editor = {Oguz, Ipek and Noble, Jack and Li, Xiaoxiao and Styner, Martin and Baumgartner, Christian and Rusu, Mirabela and Heinmann, Tobias and Kontos, Despina and Landman, Bennett and Dawant, Benoit}, volume = {227}, series = {Proceedings of Machine Learning Research}, month = {10--12 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v227/huang24a/huang24a.pdf}, url = {https://proceedings.mlr.press/v227/huang24a.html}, abstract = {Although human’s ability to visually understand the structure of the World plays a crucial role in perceiving the World and making appropriate decisions, human perception does not solely rely on vision but amalgamates the information from acoustic, verbal, and visual stimuli. An active area of research has been revolving around designing an efficient framework that adapts to multiple modalities and ideally improves the performance of existing tasks. While numerous frameworks have proved effective on natural datasets like ImageNet, a limited number of studies have been carried out in the biomedical domain. In this work, we extend the available frameworks for natural data to biomedical data by leveraging the abundant, unstructured multi-modal data available as radiology images and reports. We attempt to answer the question, ”For multi-modal learning, self-supervised learning and joint learning using both learning strategies, which one improves the visual representation for downstream chest radiographs classification tasks the most?”. Our experiments indicated that in limited labeled data settings with 1% and 10% labeled data, the joint learning with multi-modal and self-supervised models outperforms self-supervised learning and is at par with multi-modal learning. Additionally, we found that multi-modal learning is generally more robust on out-of-distribution datasets.} }
Endnote
%0 Conference Paper %T Radiology Reports Improve Visual Representations Learned from Radiographs %A Haoxu Huang %A Samyak Rawlekar %A Sumit Chopra %A Cem M Deniz %B Medical Imaging with Deep Learning %C Proceedings of Machine Learning Research %D 2024 %E Ipek Oguz %E Jack Noble %E Xiaoxiao Li %E Martin Styner %E Christian Baumgartner %E Mirabela Rusu %E Tobias Heinmann %E Despina Kontos %E Bennett Landman %E Benoit Dawant %F pmlr-v227-huang24a %I PMLR %P 1385--1405 %U https://proceedings.mlr.press/v227/huang24a.html %V 227 %X Although human’s ability to visually understand the structure of the World plays a crucial role in perceiving the World and making appropriate decisions, human perception does not solely rely on vision but amalgamates the information from acoustic, verbal, and visual stimuli. An active area of research has been revolving around designing an efficient framework that adapts to multiple modalities and ideally improves the performance of existing tasks. While numerous frameworks have proved effective on natural datasets like ImageNet, a limited number of studies have been carried out in the biomedical domain. In this work, we extend the available frameworks for natural data to biomedical data by leveraging the abundant, unstructured multi-modal data available as radiology images and reports. We attempt to answer the question, ”For multi-modal learning, self-supervised learning and joint learning using both learning strategies, which one improves the visual representation for downstream chest radiographs classification tasks the most?”. Our experiments indicated that in limited labeled data settings with 1% and 10% labeled data, the joint learning with multi-modal and self-supervised models outperforms self-supervised learning and is at par with multi-modal learning. Additionally, we found that multi-modal learning is generally more robust on out-of-distribution datasets.
APA
Huang, H., Rawlekar, S., Chopra, S. & Deniz, C.M.. (2024). Radiology Reports Improve Visual Representations Learned from Radiographs. Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 227:1385-1405 Available from https://proceedings.mlr.press/v227/huang24a.html.

Related Material