Less can be more in contrastive learning

Jovana Mitrovic, Brian McWilliams, Melanie Rey
Proceedings on "I Can't Believe It's Not Better!" at NeurIPS Workshops, PMLR 137:70-75, 2020.

Abstract

Unsupervised representation learning provides an attractive alternative to its supervised counterpart because of the abundance of unlabelled data. Contrastive learning has recently emerged as one of the most successful approaches to unsupervised representation learning. Given a datapoint, contrastive learning involves discriminating between a matching, or positive, datapoint and a number of non-matching, or negative, ones. Usually the other datapoints in the batch serve as the negatives for the given datapoint. It has been shown empirically that large batch sizes are needed to achieve good performance, which led the the belief that a large number of negatives is preferable. In order to understand this phenomenon better, in this work investigate the role of negatives in contrastive learning by decoupling the number of negatives from the batch size. Surprisingly, we discover that for a fixed batch size performance actually degrades as the number of negatives is increased. We also show that using fewer negatives can lead to a better signal-to-noise ratio for the model gradients, which could explain the improved performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v137-mitrovic20a, title = {Less can be more in contrastive learning}, author = {Mitrovic, Jovana and McWilliams, Brian and Rey, Melanie}, booktitle = {Proceedings on "I Can't Believe It's Not Better!" at NeurIPS Workshops}, pages = {70--75}, year = {2020}, editor = {Zosa Forde, Jessica and Ruiz, Francisco and Pradier, Melanie F. and Schein, Aaron}, volume = {137}, series = {Proceedings of Machine Learning Research}, month = {12 Dec}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v137/mitrovic20a/mitrovic20a.pdf}, url = {https://proceedings.mlr.press/v137/mitrovic20a.html}, abstract = {Unsupervised representation learning provides an attractive alternative to its supervised counterpart because of the abundance of unlabelled data. Contrastive learning has recently emerged as one of the most successful approaches to unsupervised representation learning. Given a datapoint, contrastive learning involves discriminating between a matching, or positive, datapoint and a number of non-matching, or negative, ones. Usually the other datapoints in the batch serve as the negatives for the given datapoint. It has been shown empirically that large batch sizes are needed to achieve good performance, which led the the belief that a large number of negatives is preferable. In order to understand this phenomenon better, in this work investigate the role of negatives in contrastive learning by decoupling the number of negatives from the batch size. Surprisingly, we discover that for a fixed batch size performance actually degrades as the number of negatives is increased. We also show that using fewer negatives can lead to a better signal-to-noise ratio for the model gradients, which could explain the improved performance.} }
Endnote
%0 Conference Paper %T Less can be more in contrastive learning %A Jovana Mitrovic %A Brian McWilliams %A Melanie Rey %B Proceedings on "I Can't Believe It's Not Better!" at NeurIPS Workshops %C Proceedings of Machine Learning Research %D 2020 %E Jessica Zosa Forde %E Francisco Ruiz %E Melanie F. Pradier %E Aaron Schein %F pmlr-v137-mitrovic20a %I PMLR %P 70--75 %U https://proceedings.mlr.press/v137/mitrovic20a.html %V 137 %X Unsupervised representation learning provides an attractive alternative to its supervised counterpart because of the abundance of unlabelled data. Contrastive learning has recently emerged as one of the most successful approaches to unsupervised representation learning. Given a datapoint, contrastive learning involves discriminating between a matching, or positive, datapoint and a number of non-matching, or negative, ones. Usually the other datapoints in the batch serve as the negatives for the given datapoint. It has been shown empirically that large batch sizes are needed to achieve good performance, which led the the belief that a large number of negatives is preferable. In order to understand this phenomenon better, in this work investigate the role of negatives in contrastive learning by decoupling the number of negatives from the batch size. Surprisingly, we discover that for a fixed batch size performance actually degrades as the number of negatives is increased. We also show that using fewer negatives can lead to a better signal-to-noise ratio for the model gradients, which could explain the improved performance.
APA
Mitrovic, J., McWilliams, B. & Rey, M.. (2020). Less can be more in contrastive learning. Proceedings on "I Can't Believe It's Not Better!" at NeurIPS Workshops, in Proceedings of Machine Learning Research 137:70-75 Available from https://proceedings.mlr.press/v137/mitrovic20a.html.

Related Material