Uncovering Latent Subgroups: Spectral Clustering for Fairness Analysis in Contrastive Embeddings

Hridoy Rahman, Blessing Ogbuokiri
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:1076-1083, 2026.

Abstract

Contrastive learning enables scalable representation learning in computer vision and healthcare, yet embedding spaces may encode unequal geometric structure across latent subpopulations, leading to downstream performance disparities. Conventional fairness audits relying on demographic labels often fail to detect such structural bias. This work examines whether contrastive embeddings contain fairness relevant latent subgroups that can be identified without demographic supervision. We introduce a label free spectral fairness audit that constructs similarity graphs over CLIP embeddings and applies eigengap based spectral clustering. Experiments on CheXpert reveal stable latent subgroups with noticeable geometric distortions and performance gaps, exposing hidden fairness risks missed by demographic based evaluations. This work enables label-free discovery of hidden fairness and reliability risks in contrastive embeddings, supporting safer, more transparent deployment of foundation models in healthcare and other high-stakes domain.

Cite this Paper


BibTeX
@InProceedings{pmlr-v318-rahman26b, title = {Uncovering Latent Subgroups: Spectral Clustering for Fairness Analysis in Contrastive Embeddings}, author = {Rahman, Hridoy and Ogbuokiri, Blessing}, booktitle = {Proceedings of the The 39th Canadian Conference on Artificial Intelligence}, pages = {1076--1083}, year = {2026}, editor = {Bouzar-Benlabiod, Lydia and Leung, Carson}, volume = {318}, series = {Proceedings of Machine Learning Research}, month = {25--29 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v318/main/assets/rahman26b/rahman26b.pdf}, url = {https://proceedings.mlr.press/v318/rahman26b.html}, abstract = {Contrastive learning enables scalable representation learning in computer vision and healthcare, yet embedding spaces may encode unequal geometric structure across latent subpopulations, leading to downstream performance disparities. Conventional fairness audits relying on demographic labels often fail to detect such structural bias. This work examines whether contrastive embeddings contain fairness relevant latent subgroups that can be identified without demographic supervision. We introduce a label free spectral fairness audit that constructs similarity graphs over CLIP embeddings and applies eigengap based spectral clustering. Experiments on CheXpert reveal stable latent subgroups with noticeable geometric distortions and performance gaps, exposing hidden fairness risks missed by demographic based evaluations. This work enables label-free discovery of hidden fairness and reliability risks in contrastive embeddings, supporting safer, more transparent deployment of foundation models in healthcare and other high-stakes domain.} }
Endnote
%0 Conference Paper %T Uncovering Latent Subgroups: Spectral Clustering for Fairness Analysis in Contrastive Embeddings %A Hridoy Rahman %A Blessing Ogbuokiri %B Proceedings of the The 39th Canadian Conference on Artificial Intelligence %C Proceedings of Machine Learning Research %D 2026 %E Lydia Bouzar-Benlabiod %E Carson Leung %F pmlr-v318-rahman26b %I PMLR %P 1076--1083 %U https://proceedings.mlr.press/v318/rahman26b.html %V 318 %X Contrastive learning enables scalable representation learning in computer vision and healthcare, yet embedding spaces may encode unequal geometric structure across latent subpopulations, leading to downstream performance disparities. Conventional fairness audits relying on demographic labels often fail to detect such structural bias. This work examines whether contrastive embeddings contain fairness relevant latent subgroups that can be identified without demographic supervision. We introduce a label free spectral fairness audit that constructs similarity graphs over CLIP embeddings and applies eigengap based spectral clustering. Experiments on CheXpert reveal stable latent subgroups with noticeable geometric distortions and performance gaps, exposing hidden fairness risks missed by demographic based evaluations. This work enables label-free discovery of hidden fairness and reliability risks in contrastive embeddings, supporting safer, more transparent deployment of foundation models in healthcare and other high-stakes domain.
APA
Rahman, H. & Ogbuokiri, B.. (2026). Uncovering Latent Subgroups: Spectral Clustering for Fairness Analysis in Contrastive Embeddings. Proceedings of the The 39th Canadian Conference on Artificial Intelligence, in Proceedings of Machine Learning Research 318:1076-1083 Available from https://proceedings.mlr.press/v318/rahman26b.html.

Related Material