A Scalable Algorithm for Individually Fair k-Means Clustering

MohammadHossein Bateni, Vincent Cohen-Addad, Alessandro Epasto, Silvio Lattanzi
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:3151-3159, 2024.

Abstract

We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al. Given $n$ points $P$ in a metric space, let $\delta(x)$ for $x\in P$ be the radius of the smallest ball around $x$ containing at least $n / k$ points. A clustering is then called individually fair if it has centers within distance $\delta(x)$ of $x$ for each $x\in P$. While good approximation algorithms are known for this problem no efficient practical algorithms with good theoretical guarantees have been presented. We design the first fast local-search algorithm that runs in  $O(nk^2)$ time and obtains a bicriteria $(O(1), 6)$ approximation. Then we show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-bateni24a, title = {A Scalable Algorithm for Individually Fair k-Means Clustering}, author = {Bateni, MohammadHossein and Cohen-Addad, Vincent and Epasto, Alessandro and Lattanzi, Silvio}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {3151--3159}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/bateni24a/bateni24a.pdf}, url = {https://proceedings.mlr.press/v238/bateni24a.html}, abstract = {We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al. Given $n$ points $P$ in a metric space, let $\delta(x)$ for $x\in P$ be the radius of the smallest ball around $x$ containing at least $n / k$ points. A clustering is then called individually fair if it has centers within distance $\delta(x)$ of $x$ for each $x\in P$. While good approximation algorithms are known for this problem no efficient practical algorithms with good theoretical guarantees have been presented. We design the first fast local-search algorithm that runs in  $O(nk^2)$ time and obtains a bicriteria $(O(1), 6)$ approximation. Then we show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.} }
Endnote
%0 Conference Paper %T A Scalable Algorithm for Individually Fair k-Means Clustering %A MohammadHossein Bateni %A Vincent Cohen-Addad %A Alessandro Epasto %A Silvio Lattanzi %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-bateni24a %I PMLR %P 3151--3159 %U https://proceedings.mlr.press/v238/bateni24a.html %V 238 %X We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al. Given $n$ points $P$ in a metric space, let $\delta(x)$ for $x\in P$ be the radius of the smallest ball around $x$ containing at least $n / k$ points. A clustering is then called individually fair if it has centers within distance $\delta(x)$ of $x$ for each $x\in P$. While good approximation algorithms are known for this problem no efficient practical algorithms with good theoretical guarantees have been presented. We design the first fast local-search algorithm that runs in  $O(nk^2)$ time and obtains a bicriteria $(O(1), 6)$ approximation. Then we show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
APA
Bateni, M., Cohen-Addad, V., Epasto, A. & Lattanzi, S.. (2024). A Scalable Algorithm for Individually Fair k-Means Clustering. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:3151-3159 Available from https://proceedings.mlr.press/v238/bateni24a.html.

Related Material