Sample-Specific Debiasing for Better Image-Text Models

Peiqi Wang, Yingcheng Liu, Ching-Yun Ko, William M. Wells, Seth Berkowitz, Steven Horng, Polina Golland
Proceedings of the 8th Machine Learning for Healthcare Conference, PMLR 219:788-803, 2023.

Abstract

Self-supervised representation learning on image-text data facilitates crucial medical applications, such as image classification, visual grounding, and cross-modal retrieval. One common approach involves contrasting semantically similar (positive) and dissimilar (negative) pairs of data points. Drawing negative samples uniformly from the training data set introduces false negatives, i.e., samples that are treated as dissimilar but belong to the same class. In healthcare data, the underlying class distribution is nonuniform, implying that false negatives occur at a highly variable rate. To improve the quality of learned representations, we develop a novel approach that corrects for false negatives. Our method can be viewed as a variant of debiased contrastive learning that uses estimated sample-specific class probabilities. We provide theoretical analysis of the objective function and demonstrate the proposed approach on both image and paired image-text data sets. Our experiments illustrate empirical advantages of sample-specific debiasing.

Cite this Paper


BibTeX
@InProceedings{pmlr-v219-wang23b, title = {Sample-Specific Debiasing for Better Image-Text Models}, author = {Wang, Peiqi and Liu, Yingcheng and Ko, Ching-Yun and Wells, William M. and Berkowitz, Seth and Horng, Steven and Golland, Polina}, booktitle = {Proceedings of the 8th Machine Learning for Healthcare Conference}, pages = {788--803}, year = {2023}, editor = {Deshpande, Kaivalya and Fiterau, Madalina and Joshi, Shalmali and Lipton, Zachary and Ranganath, Rajesh and Urteaga, Iñigo and Yeung, Serene}, volume = {219}, series = {Proceedings of Machine Learning Research}, month = {11--12 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v219/wang23b/wang23b.pdf}, url = {https://proceedings.mlr.press/v219/wang23b.html}, abstract = {Self-supervised representation learning on image-text data facilitates crucial medical applications, such as image classification, visual grounding, and cross-modal retrieval. One common approach involves contrasting semantically similar (positive) and dissimilar (negative) pairs of data points. Drawing negative samples uniformly from the training data set introduces false negatives, i.e., samples that are treated as dissimilar but belong to the same class. In healthcare data, the underlying class distribution is nonuniform, implying that false negatives occur at a highly variable rate. To improve the quality of learned representations, we develop a novel approach that corrects for false negatives. Our method can be viewed as a variant of debiased contrastive learning that uses estimated sample-specific class probabilities. We provide theoretical analysis of the objective function and demonstrate the proposed approach on both image and paired image-text data sets. Our experiments illustrate empirical advantages of sample-specific debiasing.} }
Endnote
%0 Conference Paper %T Sample-Specific Debiasing for Better Image-Text Models %A Peiqi Wang %A Yingcheng Liu %A Ching-Yun Ko %A William M. Wells %A Seth Berkowitz %A Steven Horng %A Polina Golland %B Proceedings of the 8th Machine Learning for Healthcare Conference %C Proceedings of Machine Learning Research %D 2023 %E Kaivalya Deshpande %E Madalina Fiterau %E Shalmali Joshi %E Zachary Lipton %E Rajesh Ranganath %E Iñigo Urteaga %E Serene Yeung %F pmlr-v219-wang23b %I PMLR %P 788--803 %U https://proceedings.mlr.press/v219/wang23b.html %V 219 %X Self-supervised representation learning on image-text data facilitates crucial medical applications, such as image classification, visual grounding, and cross-modal retrieval. One common approach involves contrasting semantically similar (positive) and dissimilar (negative) pairs of data points. Drawing negative samples uniformly from the training data set introduces false negatives, i.e., samples that are treated as dissimilar but belong to the same class. In healthcare data, the underlying class distribution is nonuniform, implying that false negatives occur at a highly variable rate. To improve the quality of learned representations, we develop a novel approach that corrects for false negatives. Our method can be viewed as a variant of debiased contrastive learning that uses estimated sample-specific class probabilities. We provide theoretical analysis of the objective function and demonstrate the proposed approach on both image and paired image-text data sets. Our experiments illustrate empirical advantages of sample-specific debiasing.
APA
Wang, P., Liu, Y., Ko, C., Wells, W.M., Berkowitz, S., Horng, S. & Golland, P.. (2023). Sample-Specific Debiasing for Better Image-Text Models. Proceedings of the 8th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 219:788-803 Available from https://proceedings.mlr.press/v219/wang23b.html.

Related Material