Fair k-Center Clustering for Data Summarization

Matthäus Kleindessner; Pranjal Awasthi; Jamie Morgenstern

Fair k-Center Clustering for Data Summarization

Matthäus Kleindessner, Pranjal Awasthi, Jamie Morgenstern

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:3448-3457, 2019.

Abstract

In data summarization we want to choose $k$ prototypes in order to summarize a data set. We study a setting where the data set comprises several demographic groups and we are restricted to choose $k_i$ prototypes belonging to group $i$. A common approach to the problem without the fairness constraint is to optimize a centroid-based clustering objective such as $k$-center. A natural extension then is to incorporate the fairness constraint into the clustering problem. Existing algorithms for doing so run in time super-quadratic in the size of the data set, which is in contrast to the standard $k$-center problem being approximable in linear time. In this paper, we resolve this gap by providing a simple approximation algorithm for the $k$-center problem under the fairness constraint with running time linear in the size of the data set and $k$. If the number of demographic groups is small, the approximation guarantee of our algorithm only incurs a constant-factor overhead.

Cite this Paper

BibTeX

@InProceedings{pmlr-v97-kleindessner19a,
  title = 	 {Fair k-Center Clustering for Data Summarization},
  author =       {Kleindessner, Matth{\"a}us and Awasthi, Pranjal and Morgenstern, Jamie},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {3448--3457},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/kleindessner19a/kleindessner19a.pdf},
  url = 	 {https://proceedings.mlr.press/v97/kleindessner19a.html},
  abstract = 	 {In data summarization we want to choose $k$ prototypes in order to summarize a data set. We study a setting where the data set comprises several demographic groups and we are restricted to choose $k_i$ prototypes belonging to group $i$. A common approach to the problem without the fairness constraint is to optimize a centroid-based clustering objective such as $k$-center. A natural extension then is to incorporate the fairness constraint into the clustering problem. Existing algorithms for doing so run in time super-quadratic in the size of the data set, which is in contrast to the standard $k$-center problem being approximable in linear time. In this paper, we resolve this gap by providing a simple approximation algorithm for the $k$-center problem under the fairness constraint with running time linear in the size of the data set and $k$. If the number of demographic groups is small, the approximation guarantee of our algorithm only incurs a constant-factor overhead.}
}

Endnote

%0 Conference Paper
%T Fair k-Center Clustering for Data Summarization
%A Matthäus Kleindessner
%A Pranjal Awasthi
%A Jamie Morgenstern
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-kleindessner19a
%I PMLR
%P 3448--3457
%U https://proceedings.mlr.press/v97/kleindessner19a.html
%V 97
%X In data summarization we want to choose $k$ prototypes in order to summarize a data set. We study a setting where the data set comprises several demographic groups and we are restricted to choose $k_i$ prototypes belonging to group $i$. A common approach to the problem without the fairness constraint is to optimize a centroid-based clustering objective such as $k$-center. A natural extension then is to incorporate the fairness constraint into the clustering problem. Existing algorithms for doing so run in time super-quadratic in the size of the data set, which is in contrast to the standard $k$-center problem being approximable in linear time. In this paper, we resolve this gap by providing a simple approximation algorithm for the $k$-center problem under the fairness constraint with running time linear in the size of the data set and $k$. If the number of demographic groups is small, the approximation guarantee of our algorithm only incurs a constant-factor overhead.

APA

Kleindessner, M., Awasthi, P. & Morgenstern, J.. (2019). Fair k-Center Clustering for Data Summarization. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:3448-3457 Available from https://proceedings.mlr.press/v97/kleindessner19a.html.

Fair k-Center Clustering for Data Summarization

Abstract

Cite this Paper

Related Material