Discovering Global False Negatives On the Fly for Self-supervised Contrastive Learning

Vicente Balmaseda, Bokun Wang, Ching-Long Lin, Tianbao Yang
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:2697-2714, 2025.

Abstract

In self-supervised contrastive learning, negative pairs are typically constructed using an anchor image and a sample drawn from the entire dataset, excluding the anchor. However, this approach can result in the creation of negative pairs with similar semantics, referred to as "false negatives", leading to their embeddings being falsely pushed apart. To address this issue, we introduce GloFND, an optimization-based approach that automatically learns on the fly the threshold for each anchor data to identify its false negatives during training. In contrast to previous methods for false negative discovery, our approach globally detects false negatives across the entire dataset rather than locally within the mini-batch. Moreover, its per-iteration computation cost remains independent of the dataset size. Experimental results on image and image-text data demonstrate the effectiveness of the proposed method. Our implementation is available at https://github.com/vibalcam/GloFND.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-balmaseda25a, title = {Discovering Global False Negatives On the Fly for Self-supervised Contrastive Learning}, author = {Balmaseda, Vicente and Wang, Bokun and Lin, Ching-Long and Yang, Tianbao}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {2697--2714}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/balmaseda25a/balmaseda25a.pdf}, url = {https://proceedings.mlr.press/v267/balmaseda25a.html}, abstract = {In self-supervised contrastive learning, negative pairs are typically constructed using an anchor image and a sample drawn from the entire dataset, excluding the anchor. However, this approach can result in the creation of negative pairs with similar semantics, referred to as "false negatives", leading to their embeddings being falsely pushed apart. To address this issue, we introduce GloFND, an optimization-based approach that automatically learns on the fly the threshold for each anchor data to identify its false negatives during training. In contrast to previous methods for false negative discovery, our approach globally detects false negatives across the entire dataset rather than locally within the mini-batch. Moreover, its per-iteration computation cost remains independent of the dataset size. Experimental results on image and image-text data demonstrate the effectiveness of the proposed method. Our implementation is available at https://github.com/vibalcam/GloFND.} }
Endnote
%0 Conference Paper %T Discovering Global False Negatives On the Fly for Self-supervised Contrastive Learning %A Vicente Balmaseda %A Bokun Wang %A Ching-Long Lin %A Tianbao Yang %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-balmaseda25a %I PMLR %P 2697--2714 %U https://proceedings.mlr.press/v267/balmaseda25a.html %V 267 %X In self-supervised contrastive learning, negative pairs are typically constructed using an anchor image and a sample drawn from the entire dataset, excluding the anchor. However, this approach can result in the creation of negative pairs with similar semantics, referred to as "false negatives", leading to their embeddings being falsely pushed apart. To address this issue, we introduce GloFND, an optimization-based approach that automatically learns on the fly the threshold for each anchor data to identify its false negatives during training. In contrast to previous methods for false negative discovery, our approach globally detects false negatives across the entire dataset rather than locally within the mini-batch. Moreover, its per-iteration computation cost remains independent of the dataset size. Experimental results on image and image-text data demonstrate the effectiveness of the proposed method. Our implementation is available at https://github.com/vibalcam/GloFND.
APA
Balmaseda, V., Wang, B., Lin, C. & Yang, T.. (2025). Discovering Global False Negatives On the Fly for Self-supervised Contrastive Learning. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:2697-2714 Available from https://proceedings.mlr.press/v267/balmaseda25a.html.

Related Material