Relative Error Fair Clustering in the Weak-Strong Oracle Model

Vladimir Braverman, Prathamesh Dharangutte, Shaofeng H.-C. Jiang, Hoai-An Nguyen, Chen Wang, Yubo Zhang, Samson Zhou
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:5418-5453, 2025.

Abstract

We study fair clustering problems in a setting where distance information is obtained from two sources: a strong oracle providing exact distances, but at a high cost, and a weak oracle providing potentially inaccurate distance estimates at a low cost. The goal is to produce a near-optimal fair clustering on $n$ input points with a minimum number of strong oracle queries. This models the increasingly common trade-off between accurate but expensive similarity measures (e.g., large-scale embeddings) and cheaper but inaccurate alternatives. The study of fair clustering in the model is motivated by the important quest of achieving fairness with the presence of inaccurate information. We achieve the first $(1+\varepsilon)$-coresets for fair $k$-median clustering using $\text{poly}\left(\frac{k}{\varepsilon}\cdot\log n\right)$ queries to the strong oracle. Furthermore, our results imply coresets for the standard setting (without fairness constraints), and we could in fact obtain $(1+\varepsilon)$-coresets for $(k,z)$-clustering for general $z=O(1)$ with a similar number of strong oracle queries. In contrast, previous results achieved a constant-factor $(>10)$ approximation for the standard $k$-clustering problems, and no previous work considered the fair $k$-median clustering problem.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-braverman25a, title = {Relative Error Fair Clustering in the Weak-Strong Oracle Model}, author = {Braverman, Vladimir and Dharangutte, Prathamesh and Jiang, Shaofeng H.-C. and Nguyen, Hoai-An and Wang, Chen and Zhang, Yubo and Zhou, Samson}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {5418--5453}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/braverman25a/braverman25a.pdf}, url = {https://proceedings.mlr.press/v267/braverman25a.html}, abstract = {We study fair clustering problems in a setting where distance information is obtained from two sources: a strong oracle providing exact distances, but at a high cost, and a weak oracle providing potentially inaccurate distance estimates at a low cost. The goal is to produce a near-optimal fair clustering on $n$ input points with a minimum number of strong oracle queries. This models the increasingly common trade-off between accurate but expensive similarity measures (e.g., large-scale embeddings) and cheaper but inaccurate alternatives. The study of fair clustering in the model is motivated by the important quest of achieving fairness with the presence of inaccurate information. We achieve the first $(1+\varepsilon)$-coresets for fair $k$-median clustering using $\text{poly}\left(\frac{k}{\varepsilon}\cdot\log n\right)$ queries to the strong oracle. Furthermore, our results imply coresets for the standard setting (without fairness constraints), and we could in fact obtain $(1+\varepsilon)$-coresets for $(k,z)$-clustering for general $z=O(1)$ with a similar number of strong oracle queries. In contrast, previous results achieved a constant-factor $(>10)$ approximation for the standard $k$-clustering problems, and no previous work considered the fair $k$-median clustering problem.} }
Endnote
%0 Conference Paper %T Relative Error Fair Clustering in the Weak-Strong Oracle Model %A Vladimir Braverman %A Prathamesh Dharangutte %A Shaofeng H.-C. Jiang %A Hoai-An Nguyen %A Chen Wang %A Yubo Zhang %A Samson Zhou %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-braverman25a %I PMLR %P 5418--5453 %U https://proceedings.mlr.press/v267/braverman25a.html %V 267 %X We study fair clustering problems in a setting where distance information is obtained from two sources: a strong oracle providing exact distances, but at a high cost, and a weak oracle providing potentially inaccurate distance estimates at a low cost. The goal is to produce a near-optimal fair clustering on $n$ input points with a minimum number of strong oracle queries. This models the increasingly common trade-off between accurate but expensive similarity measures (e.g., large-scale embeddings) and cheaper but inaccurate alternatives. The study of fair clustering in the model is motivated by the important quest of achieving fairness with the presence of inaccurate information. We achieve the first $(1+\varepsilon)$-coresets for fair $k$-median clustering using $\text{poly}\left(\frac{k}{\varepsilon}\cdot\log n\right)$ queries to the strong oracle. Furthermore, our results imply coresets for the standard setting (without fairness constraints), and we could in fact obtain $(1+\varepsilon)$-coresets for $(k,z)$-clustering for general $z=O(1)$ with a similar number of strong oracle queries. In contrast, previous results achieved a constant-factor $(>10)$ approximation for the standard $k$-clustering problems, and no previous work considered the fair $k$-median clustering problem.
APA
Braverman, V., Dharangutte, P., Jiang, S.H., Nguyen, H., Wang, C., Zhang, Y. & Zhou, S.. (2025). Relative Error Fair Clustering in the Weak-Strong Oracle Model. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:5418-5453 Available from https://proceedings.mlr.press/v267/braverman25a.html.

Related Material