EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence

Chung-Yiu Yau, Hoi To Wai, Parameswaran Raman, Soumajyoti Sarkar, Mingyi Hong
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:56966-56981, 2024.

Abstract

A key challenge in contrastive learning is to generate negative samples from a large sample set to contrast with positive samples, for learning better encoding of the data. These negative samples often follow a softmax distribution which are dynamically updated during the training process. However, sampling from this distribution is non-trivial due to the high computational costs in computing the partition function. In this paper, we propose an $\underline{\text{E}}$fficient $\underline{\text{M}}$arkov $\underline{\text{C}}$hain Monte Carlo negative sampling method for $\underline{\text{C}}$ontrastive learning (EMC$^2$). We follow the global contrastive learning loss as introduced in SogCLR, and propose EMC$^2$ which utilizes an adaptive Metropolis-Hastings subroutine to generate hardness-aware negative samples in an online fashion during the optimization. We prove that EMC$^2$ finds an $\mathcal{O}(1/\sqrt{T})$-stationary point of the global contrastive loss in $T$ iterations. Compared to prior works, EMC$^2$ is the first algorithm that exhibits global convergence (to stationarity) regardless of the choice of batch size while exhibiting low computation and memory cost. Numerical experiments validate that EMC$^2$ is effective with small batch training and achieves comparable or better performance than baseline algorithms. We report the results for pre-training image encoders on STL-10 and Imagenet-100.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-yau24a, title = {{EMC}$^2$: Efficient {MCMC} Negative Sampling for Contrastive Learning with Global Convergence}, author = {Yau, Chung-Yiu and Wai, Hoi To and Raman, Parameswaran and Sarkar, Soumajyoti and Hong, Mingyi}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {56966--56981}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/yau24a/yau24a.pdf}, url = {https://proceedings.mlr.press/v235/yau24a.html}, abstract = {A key challenge in contrastive learning is to generate negative samples from a large sample set to contrast with positive samples, for learning better encoding of the data. These negative samples often follow a softmax distribution which are dynamically updated during the training process. However, sampling from this distribution is non-trivial due to the high computational costs in computing the partition function. In this paper, we propose an $\underline{\text{E}}$fficient $\underline{\text{M}}$arkov $\underline{\text{C}}$hain Monte Carlo negative sampling method for $\underline{\text{C}}$ontrastive learning (EMC$^2$). We follow the global contrastive learning loss as introduced in SogCLR, and propose EMC$^2$ which utilizes an adaptive Metropolis-Hastings subroutine to generate hardness-aware negative samples in an online fashion during the optimization. We prove that EMC$^2$ finds an $\mathcal{O}(1/\sqrt{T})$-stationary point of the global contrastive loss in $T$ iterations. Compared to prior works, EMC$^2$ is the first algorithm that exhibits global convergence (to stationarity) regardless of the choice of batch size while exhibiting low computation and memory cost. Numerical experiments validate that EMC$^2$ is effective with small batch training and achieves comparable or better performance than baseline algorithms. We report the results for pre-training image encoders on STL-10 and Imagenet-100.} }
Endnote
%0 Conference Paper %T EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence %A Chung-Yiu Yau %A Hoi To Wai %A Parameswaran Raman %A Soumajyoti Sarkar %A Mingyi Hong %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-yau24a %I PMLR %P 56966--56981 %U https://proceedings.mlr.press/v235/yau24a.html %V 235 %X A key challenge in contrastive learning is to generate negative samples from a large sample set to contrast with positive samples, for learning better encoding of the data. These negative samples often follow a softmax distribution which are dynamically updated during the training process. However, sampling from this distribution is non-trivial due to the high computational costs in computing the partition function. In this paper, we propose an $\underline{\text{E}}$fficient $\underline{\text{M}}$arkov $\underline{\text{C}}$hain Monte Carlo negative sampling method for $\underline{\text{C}}$ontrastive learning (EMC$^2$). We follow the global contrastive learning loss as introduced in SogCLR, and propose EMC$^2$ which utilizes an adaptive Metropolis-Hastings subroutine to generate hardness-aware negative samples in an online fashion during the optimization. We prove that EMC$^2$ finds an $\mathcal{O}(1/\sqrt{T})$-stationary point of the global contrastive loss in $T$ iterations. Compared to prior works, EMC$^2$ is the first algorithm that exhibits global convergence (to stationarity) regardless of the choice of batch size while exhibiting low computation and memory cost. Numerical experiments validate that EMC$^2$ is effective with small batch training and achieves comparable or better performance than baseline algorithms. We report the results for pre-training image encoders on STL-10 and Imagenet-100.
APA
Yau, C., Wai, H.T., Raman, P., Sarkar, S. & Hong, M.. (2024). EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:56966-56981 Available from https://proceedings.mlr.press/v235/yau24a.html.

Related Material