Reducing Representation Bias through Fairness-Driven Sampling in Contrastive Learning

David Maritn, Blessing Ogbuokiri
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:1137-1144, 2026.

Abstract

Contrastive learning is a widely applicable Self-Supervised machine learning algorithm that has demonstrated state of the art performance often competing with supervised learning methods. However, the stochastic approach to sampling can inherently amplify representation bias, as over-represented groups are more likely to dominate contrastive pair construction while underrepresented groups receive limited exposure during training leading to imbalanced subgroup representation and biased downstream performance. To address this issue, we propose a fairness-driven sampling algorithm that leverages latent similarity structure to infer subgroup information and guide positive and negative pair selection without the reliance on annotated demographic attributes. Our fairness-driven approach is evaluated in terms of both fairness representation and utility. The results show that our fairness-driven sampling strategy not only increases representation across underrepresented latent subgroups, but maintains competitive accuracy with baseline Contrastive learning sampling. This method has the potential to improve fairness in downstream applications such as facial recognition, clinical diagnostics, and language models deployed in demographically diverse or low-resource contexts.

Cite this Paper


BibTeX
@InProceedings{pmlr-v318-maritn26a, title = {Reducing Representation Bias through Fairness-Driven Sampling in Contrastive Learning}, author = {Maritn, David and Ogbuokiri, Blessing}, booktitle = {Proceedings of the The 39th Canadian Conference on Artificial Intelligence}, pages = {1137--1144}, year = {2026}, editor = {Bouzar-Benlabiod, Lydia and Leung, Carson}, volume = {318}, series = {Proceedings of Machine Learning Research}, month = {25--29 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v318/main/assets/maritn26a/maritn26a.pdf}, url = {https://proceedings.mlr.press/v318/maritn26a.html}, abstract = {Contrastive learning is a widely applicable Self-Supervised machine learning algorithm that has demonstrated state of the art performance often competing with supervised learning methods. However, the stochastic approach to sampling can inherently amplify representation bias, as over-represented groups are more likely to dominate contrastive pair construction while underrepresented groups receive limited exposure during training leading to imbalanced subgroup representation and biased downstream performance. To address this issue, we propose a fairness-driven sampling algorithm that leverages latent similarity structure to infer subgroup information and guide positive and negative pair selection without the reliance on annotated demographic attributes. Our fairness-driven approach is evaluated in terms of both fairness representation and utility. The results show that our fairness-driven sampling strategy not only increases representation across underrepresented latent subgroups, but maintains competitive accuracy with baseline Contrastive learning sampling. This method has the potential to improve fairness in downstream applications such as facial recognition, clinical diagnostics, and language models deployed in demographically diverse or low-resource contexts.} }
Endnote
%0 Conference Paper %T Reducing Representation Bias through Fairness-Driven Sampling in Contrastive Learning %A David Maritn %A Blessing Ogbuokiri %B Proceedings of the The 39th Canadian Conference on Artificial Intelligence %C Proceedings of Machine Learning Research %D 2026 %E Lydia Bouzar-Benlabiod %E Carson Leung %F pmlr-v318-maritn26a %I PMLR %P 1137--1144 %U https://proceedings.mlr.press/v318/maritn26a.html %V 318 %X Contrastive learning is a widely applicable Self-Supervised machine learning algorithm that has demonstrated state of the art performance often competing with supervised learning methods. However, the stochastic approach to sampling can inherently amplify representation bias, as over-represented groups are more likely to dominate contrastive pair construction while underrepresented groups receive limited exposure during training leading to imbalanced subgroup representation and biased downstream performance. To address this issue, we propose a fairness-driven sampling algorithm that leverages latent similarity structure to infer subgroup information and guide positive and negative pair selection without the reliance on annotated demographic attributes. Our fairness-driven approach is evaluated in terms of both fairness representation and utility. The results show that our fairness-driven sampling strategy not only increases representation across underrepresented latent subgroups, but maintains competitive accuracy with baseline Contrastive learning sampling. This method has the potential to improve fairness in downstream applications such as facial recognition, clinical diagnostics, and language models deployed in demographically diverse or low-resource contexts.
APA
Maritn, D. & Ogbuokiri, B.. (2026). Reducing Representation Bias through Fairness-Driven Sampling in Contrastive Learning. Proceedings of the The 39th Canadian Conference on Artificial Intelligence, in Proceedings of Machine Learning Research 318:1137-1144 Available from https://proceedings.mlr.press/v318/maritn26a.html.

Related Material