Scalable Fair Clustering

Arturs Backurs, Piotr Indyk, Krzysztof Onak, Baruch Schieber, Ali Vakilian, Tal Wagner
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:405-413, 2019.

Abstract

We study the fair variant of the classic k-median problem introduced by (Chierichetti et al., NeurIPS 2017) in which the points are colored, and the goal is to minimize the same average distance objective as in the standard $k$-median problem while ensuring that all clusters have an “approximately equal” number of points of each color. (Chierichetti et al., NeurIPS 2017) proposed a two-phase algorithm for fair $k$-clustering. In the first step, the pointset is partitioned into subsets called fairlets that satisfy the fairness requirement and approximately preserve the k-median objective. In the second step, fairlets are merged into k clusters by one of the existing k-median algorithms. The running time of this algorithm is dominated by the first step, which takes super-quadratic time. In this paper, we present a practical approximate fairlet decomposition algorithm that runs in nearly linear time.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-backurs19a, title = {Scalable Fair Clustering}, author = {Backurs, Arturs and Indyk, Piotr and Onak, Krzysztof and Schieber, Baruch and Vakilian, Ali and Wagner, Tal}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {405--413}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/backurs19a/backurs19a.pdf}, url = {https://proceedings.mlr.press/v97/backurs19a.html}, abstract = {We study the fair variant of the classic k-median problem introduced by (Chierichetti et al., NeurIPS 2017) in which the points are colored, and the goal is to minimize the same average distance objective as in the standard $k$-median problem while ensuring that all clusters have an “approximately equal” number of points of each color. (Chierichetti et al., NeurIPS 2017) proposed a two-phase algorithm for fair $k$-clustering. In the first step, the pointset is partitioned into subsets called fairlets that satisfy the fairness requirement and approximately preserve the k-median objective. In the second step, fairlets are merged into k clusters by one of the existing k-median algorithms. The running time of this algorithm is dominated by the first step, which takes super-quadratic time. In this paper, we present a practical approximate fairlet decomposition algorithm that runs in nearly linear time.} }
Endnote
%0 Conference Paper %T Scalable Fair Clustering %A Arturs Backurs %A Piotr Indyk %A Krzysztof Onak %A Baruch Schieber %A Ali Vakilian %A Tal Wagner %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-backurs19a %I PMLR %P 405--413 %U https://proceedings.mlr.press/v97/backurs19a.html %V 97 %X We study the fair variant of the classic k-median problem introduced by (Chierichetti et al., NeurIPS 2017) in which the points are colored, and the goal is to minimize the same average distance objective as in the standard $k$-median problem while ensuring that all clusters have an “approximately equal” number of points of each color. (Chierichetti et al., NeurIPS 2017) proposed a two-phase algorithm for fair $k$-clustering. In the first step, the pointset is partitioned into subsets called fairlets that satisfy the fairness requirement and approximately preserve the k-median objective. In the second step, fairlets are merged into k clusters by one of the existing k-median algorithms. The running time of this algorithm is dominated by the first step, which takes super-quadratic time. In this paper, we present a practical approximate fairlet decomposition algorithm that runs in nearly linear time.
APA
Backurs, A., Indyk, P., Onak, K., Schieber, B., Vakilian, A. & Wagner, T.. (2019). Scalable Fair Clustering. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:405-413 Available from https://proceedings.mlr.press/v97/backurs19a.html.

Related Material