MedAug: Contrastive learning leveraging patient metadata improves representations for chest X-ray interpretation

Yen Nhi Truong Vu, Richard Wang, Niranjan Balachandar, Can Liu, Andrew Y. Ng, Pranav Rajpurkar
Proceedings of the 6th Machine Learning for Healthcare Conference, PMLR 149:755-769, 2021.

Abstract

Self-supervised contrastive learning between pairs of multiple views of the same image has been shown to successfully leverage unlabeled data to produce meaningful visual representations for both natural and medical images. However, there has been limited work on determining how to select pairs for medical images, where availability of patient metadata can be leveraged to improve representations. In this work, we develop a method to select positive pairs coming from views of possibly different images through the use of patient metadata. We compare strategies for selecting positive pairs for chest X-ray interpretation including requiring them to be from the same patient, imaging study or laterality. We evaluate downstream task performance by fine-tuning the linear layer on 1% of the labeled dataset for pleural effusion classification. Our best performing positive pair selection strategy, which involves using images from the same patient from the same study across all lateralities, achieves a performance increase of 14.4% in mean AUC from the ImageNet pretrained baseline. Our controlled experiments show that the keys to improving down- stream performance on disease classification are (1) using patient metadata to appropriately create positive pairs from different images with the same underlying pathologies, and (2) maximizing the number of different images used in query pairing. In addition, we explore leveraging patient metadata to select hard negative pairs for contrastive learning, but do not find improvement over baselines that do not use metadata. Our method is broadly applicable to medical image interpretation and allows flexibility for incorporating medical insights in choosing pairs for contrastive learning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v149-vu21a, title = {MedAug: Contrastive learning leveraging patient metadata improves representations for chest X-ray interpretation}, author = {Vu, Yen Nhi Truong and Wang, Richard and Balachandar, Niranjan and Liu, Can and Ng, Andrew Y. and Rajpurkar, Pranav}, booktitle = {Proceedings of the 6th Machine Learning for Healthcare Conference}, pages = {755--769}, year = {2021}, editor = {Jung, Ken and Yeung, Serena and Sendak, Mark and Sjoding, Michael and Ranganath, Rajesh}, volume = {149}, series = {Proceedings of Machine Learning Research}, month = {06--07 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v149/vu21a/vu21a.pdf}, url = {https://proceedings.mlr.press/v149/vu21a.html}, abstract = {Self-supervised contrastive learning between pairs of multiple views of the same image has been shown to successfully leverage unlabeled data to produce meaningful visual representations for both natural and medical images. However, there has been limited work on determining how to select pairs for medical images, where availability of patient metadata can be leveraged to improve representations. In this work, we develop a method to select positive pairs coming from views of possibly different images through the use of patient metadata. We compare strategies for selecting positive pairs for chest X-ray interpretation including requiring them to be from the same patient, imaging study or laterality. We evaluate downstream task performance by fine-tuning the linear layer on 1% of the labeled dataset for pleural effusion classification. Our best performing positive pair selection strategy, which involves using images from the same patient from the same study across all lateralities, achieves a performance increase of 14.4% in mean AUC from the ImageNet pretrained baseline. Our controlled experiments show that the keys to improving down- stream performance on disease classification are (1) using patient metadata to appropriately create positive pairs from different images with the same underlying pathologies, and (2) maximizing the number of different images used in query pairing. In addition, we explore leveraging patient metadata to select hard negative pairs for contrastive learning, but do not find improvement over baselines that do not use metadata. Our method is broadly applicable to medical image interpretation and allows flexibility for incorporating medical insights in choosing pairs for contrastive learning.} }
Endnote
%0 Conference Paper %T MedAug: Contrastive learning leveraging patient metadata improves representations for chest X-ray interpretation %A Yen Nhi Truong Vu %A Richard Wang %A Niranjan Balachandar %A Can Liu %A Andrew Y. Ng %A Pranav Rajpurkar %B Proceedings of the 6th Machine Learning for Healthcare Conference %C Proceedings of Machine Learning Research %D 2021 %E Ken Jung %E Serena Yeung %E Mark Sendak %E Michael Sjoding %E Rajesh Ranganath %F pmlr-v149-vu21a %I PMLR %P 755--769 %U https://proceedings.mlr.press/v149/vu21a.html %V 149 %X Self-supervised contrastive learning between pairs of multiple views of the same image has been shown to successfully leverage unlabeled data to produce meaningful visual representations for both natural and medical images. However, there has been limited work on determining how to select pairs for medical images, where availability of patient metadata can be leveraged to improve representations. In this work, we develop a method to select positive pairs coming from views of possibly different images through the use of patient metadata. We compare strategies for selecting positive pairs for chest X-ray interpretation including requiring them to be from the same patient, imaging study or laterality. We evaluate downstream task performance by fine-tuning the linear layer on 1% of the labeled dataset for pleural effusion classification. Our best performing positive pair selection strategy, which involves using images from the same patient from the same study across all lateralities, achieves a performance increase of 14.4% in mean AUC from the ImageNet pretrained baseline. Our controlled experiments show that the keys to improving down- stream performance on disease classification are (1) using patient metadata to appropriately create positive pairs from different images with the same underlying pathologies, and (2) maximizing the number of different images used in query pairing. In addition, we explore leveraging patient metadata to select hard negative pairs for contrastive learning, but do not find improvement over baselines that do not use metadata. Our method is broadly applicable to medical image interpretation and allows flexibility for incorporating medical insights in choosing pairs for contrastive learning.
APA
Vu, Y.N.T., Wang, R., Balachandar, N., Liu, C., Ng, A.Y. & Rajpurkar, P.. (2021). MedAug: Contrastive learning leveraging patient metadata improves representations for chest X-ray interpretation. Proceedings of the 6th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 149:755-769 Available from https://proceedings.mlr.press/v149/vu21a.html.

Related Material