Fair k-Center Clustering for Data Summarization

Matthäus Kleindessner, Pranjal Awasthi, Jamie Morgenstern
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:3448-3457, 2019.

Abstract

In data summarization we want to choose $k$ prototypes in order to summarize a data set. We study a setting where the data set comprises several demographic groups and we are restricted to choose $k_i$ prototypes belonging to group $i$. A common approach to the problem without the fairness constraint is to optimize a centroid-based clustering objective such as $k$-center. A natural extension then is to incorporate the fairness constraint into the clustering problem. Existing algorithms for doing so run in time super-quadratic in the size of the data set, which is in contrast to the standard $k$-center problem being approximable in linear time. In this paper, we resolve this gap by providing a simple approximation algorithm for the $k$-center problem under the fairness constraint with running time linear in the size of the data set and $k$. If the number of demographic groups is small, the approximation guarantee of our algorithm only incurs a constant-factor overhead.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-kleindessner19a, title = {Fair k-Center Clustering for Data Summarization}, author = {Kleindessner, Matth{\"a}us and Awasthi, Pranjal and Morgenstern, Jamie}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {3448--3457}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/kleindessner19a/kleindessner19a.pdf}, url = {https://proceedings.mlr.press/v97/kleindessner19a.html}, abstract = {In data summarization we want to choose $k$ prototypes in order to summarize a data set. We study a setting where the data set comprises several demographic groups and we are restricted to choose $k_i$ prototypes belonging to group $i$. A common approach to the problem without the fairness constraint is to optimize a centroid-based clustering objective such as $k$-center. A natural extension then is to incorporate the fairness constraint into the clustering problem. Existing algorithms for doing so run in time super-quadratic in the size of the data set, which is in contrast to the standard $k$-center problem being approximable in linear time. In this paper, we resolve this gap by providing a simple approximation algorithm for the $k$-center problem under the fairness constraint with running time linear in the size of the data set and $k$. If the number of demographic groups is small, the approximation guarantee of our algorithm only incurs a constant-factor overhead.} }
Endnote
%0 Conference Paper %T Fair k-Center Clustering for Data Summarization %A Matthäus Kleindessner %A Pranjal Awasthi %A Jamie Morgenstern %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-kleindessner19a %I PMLR %P 3448--3457 %U https://proceedings.mlr.press/v97/kleindessner19a.html %V 97 %X In data summarization we want to choose $k$ prototypes in order to summarize a data set. We study a setting where the data set comprises several demographic groups and we are restricted to choose $k_i$ prototypes belonging to group $i$. A common approach to the problem without the fairness constraint is to optimize a centroid-based clustering objective such as $k$-center. A natural extension then is to incorporate the fairness constraint into the clustering problem. Existing algorithms for doing so run in time super-quadratic in the size of the data set, which is in contrast to the standard $k$-center problem being approximable in linear time. In this paper, we resolve this gap by providing a simple approximation algorithm for the $k$-center problem under the fairness constraint with running time linear in the size of the data set and $k$. If the number of demographic groups is small, the approximation guarantee of our algorithm only incurs a constant-factor overhead.
APA
Kleindessner, M., Awasthi, P. & Morgenstern, J.. (2019). Fair k-Center Clustering for Data Summarization. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:3448-3457 Available from https://proceedings.mlr.press/v97/kleindessner19a.html.

Related Material