[edit]
Reducing Representation Bias through Fairness-Driven Sampling in Contrastive Learning
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:1137-1144, 2026.
Abstract
Contrastive learning is a widely applicable Self-Supervised machine learning algorithm that has demonstrated state of the art performance often competing with supervised learning methods. However, the stochastic approach to sampling can inherently amplify representation bias, as over-represented groups are more likely to dominate contrastive pair construction while underrepresented groups receive limited exposure during training leading to imbalanced subgroup representation and biased downstream performance. To address this issue, we propose a fairness-driven sampling algorithm that leverages latent similarity structure to infer subgroup information and guide positive and negative pair selection without the reliance on annotated demographic attributes. Our fairness-driven approach is evaluated in terms of both fairness representation and utility. The results show that our fairness-driven sampling strategy not only increases representation across underrepresented latent subgroups, but maintains competitive accuracy with baseline Contrastive learning sampling. This method has the potential to improve fairness in downstream applications such as facial recognition, clinical diagnostics, and language models deployed in demographically diverse or low-resource contexts.