EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence

Chung-Yiu Yau; Hoi To Wai; Parameswaran Raman; Soumajyoti Sarkar; Mingyi Hong

EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence

Chung-Yiu Yau, Hoi To Wai, Parameswaran Raman, Soumajyoti Sarkar, Mingyi Hong

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:56966-56981, 2024.

Abstract

A key challenge in contrastive learning is to generate negative samples from a large sample set to contrast with positive samples, for learning better encoding of the data. These negative samples often follow a softmax distribution which are dynamically updated during the training process. However, sampling from this distribution is non-trivial due to the high computational costs in computing the partition function. In this paper, we propose an $\underline{\text{E}}$fficient $\underline{\text{M}}$arkov $\underline{\text{C}}$hain Monte Carlo negative sampling method for $\underline{\text{C}}$ontrastive learning (EMC$^2$). We follow the global contrastive learning loss as introduced in SogCLR, and propose EMC$^2$ which utilizes an adaptive Metropolis-Hastings subroutine to generate hardness-aware negative samples in an online fashion during the optimization. We prove that EMC$^2$ finds an $\mathcal{O}(1/\sqrt{T})$-stationary point of the global contrastive loss in $T$ iterations. Compared to prior works, EMC$^2$ is the first algorithm that exhibits global convergence (to stationarity) regardless of the choice of batch size while exhibiting low computation and memory cost. Numerical experiments validate that EMC$^2$ is effective with small batch training and achieves comparable or better performance than baseline algorithms. We report the results for pre-training image encoders on STL-10 and Imagenet-100.

Cite this Paper

BibTeX

@InProceedings{pmlr-v235-yau24a,
  title = 	 {{EMC}$^2$: Efficient {MCMC} Negative Sampling for Contrastive Learning with Global Convergence},
  author =       {Yau, Chung-Yiu and Wai, Hoi To and Raman, Parameswaran and Sarkar, Soumajyoti and Hong, Mingyi},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {56966--56981},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/yau24a/yau24a.pdf},
  url = 	 {https://proceedings.mlr.press/v235/yau24a.html},
  abstract = 	 {A key challenge in contrastive learning is to generate negative samples from a large sample set to contrast with positive samples, for learning better encoding of the data. These negative samples often follow a softmax distribution which are dynamically updated during the training process. However, sampling from this distribution is non-trivial due to the high computational costs in computing the partition function. In this paper, we propose an $\underline{\text{E}}$fficient $\underline{\text{M}}$arkov $\underline{\text{C}}$hain Monte Carlo negative sampling method for $\underline{\text{C}}$ontrastive learning (EMC$^2$). We follow the global contrastive learning loss as introduced in SogCLR, and propose EMC$^2$ which utilizes an adaptive Metropolis-Hastings subroutine to generate hardness-aware negative samples in an online fashion during the optimization. We prove that EMC$^2$ finds an $\mathcal{O}(1/\sqrt{T})$-stationary point of the global contrastive loss in $T$ iterations. Compared to prior works, EMC$^2$ is the first algorithm that exhibits global convergence (to stationarity) regardless of the choice of batch size while exhibiting low computation and memory cost. Numerical experiments validate that EMC$^2$ is effective with small batch training and achieves comparable or better performance than baseline algorithms. We report the results for pre-training image encoders on STL-10 and Imagenet-100.}
}

Endnote

%0 Conference Paper
%T EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence
%A Chung-Yiu Yau
%A Hoi To Wai
%A Parameswaran Raman
%A Soumajyoti Sarkar
%A Mingyi Hong
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-yau24a
%I PMLR
%P 56966--56981
%U https://proceedings.mlr.press/v235/yau24a.html
%V 235
%X A key challenge in contrastive learning is to generate negative samples from a large sample set to contrast with positive samples, for learning better encoding of the data. These negative samples often follow a softmax distribution which are dynamically updated during the training process. However, sampling from this distribution is non-trivial due to the high computational costs in computing the partition function. In this paper, we propose an $\underline{\text{E}}$fficient $\underline{\text{M}}$arkov $\underline{\text{C}}$hain Monte Carlo negative sampling method for $\underline{\text{C}}$ontrastive learning (EMC$^2$). We follow the global contrastive learning loss as introduced in SogCLR, and propose EMC$^2$ which utilizes an adaptive Metropolis-Hastings subroutine to generate hardness-aware negative samples in an online fashion during the optimization. We prove that EMC$^2$ finds an $\mathcal{O}(1/\sqrt{T})$-stationary point of the global contrastive loss in $T$ iterations. Compared to prior works, EMC$^2$ is the first algorithm that exhibits global convergence (to stationarity) regardless of the choice of batch size while exhibiting low computation and memory cost. Numerical experiments validate that EMC$^2$ is effective with small batch training and achieves comparable or better performance than baseline algorithms. We report the results for pre-training image encoders on STL-10 and Imagenet-100.

APA

Yau, C., Wai, H.T., Raman, P., Sarkar, S. & Hong, M.. (2024). EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:56966-56981 Available from https://proceedings.mlr.press/v235/yau24a.html.

EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence

Abstract

Cite this Paper

Related Material