Fair Clustering Using Antidote Data

Anshuman Chhabra; Adish Singla; Prasant Mohapatra

Fair Clustering Using Antidote Data

Anshuman Chhabra, Adish Singla, Prasant Mohapatra

Proceedings of The Algorithmic Fairness through the Lens of Causality and Robustness, PMLR 171:19-39, 2022.

Abstract

Clustering algorithms are widely utilized for many modern data science applications. This motivates the need to make outputs of clustering algorithms fair. Traditionally, new fair algorithmic variants to clustering algorithms are developed for specific notions of fairness. However, depending on the application context, different definitions of fairness might need to be employed. As a result, new algorithms and analysis need to be proposed for each combination of clustering algorithm and fairness definition. Additionally, each new algorithm would need to be reimplemented for deployment in a real-world system. Hence, we propose an alternate approach to group-level fairness in center-based clustering inspired by research on data poisoning attacks. We seek to augment the original dataset with a small number of data points, called antidote data. When clustering is undertaken on this new dataset, the output is fair, for the chosen clustering algorithm and fairness definition. We formulate this as a general bi-level optimization problem which can accommodate any center-based clustering algorithms and fairness notions. We then categorize approaches for solving this bi-level optimization for two different problem settings. Extensive experiments on different clustering algorithms and fairness notions show that our algorithms can achieve desired levels of fairness on many real-world datasets with a very small percentage of antidote data added. We also find that our algorithms achieve lower fairness costs and competitive clustering performance compared to other state-of-the-art fair clustering algorithms.

Cite this Paper

BibTeX

@InProceedings{pmlr-v171-chhabra22a,
  title = 	 {Fair Clustering Using Antidote Data},
  author =       {Chhabra, Anshuman and Singla, Adish and Mohapatra, Prasant},
  booktitle = 	 {Proceedings of The Algorithmic Fairness through the Lens of Causality and Robustness},
  pages = 	 {19--39},
  year = 	 {2022},
  editor = 	 {Schrouff, Jessica and Dieng, Awa and Rateike, Miriam and Kwegyir-Aggrey, Kweku and Farnadi, Golnoosh},
  volume = 	 {171},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v171/chhabra22a/chhabra22a.pdf},
  url = 	 {https://proceedings.mlr.press/v171/chhabra22a.html},
  abstract = 	 {Clustering algorithms are widely utilized for many modern data science applications. This motivates the need to make outputs of clustering algorithms fair. Traditionally, new fair algorithmic variants to clustering algorithms are developed for specific notions of fairness. However, depending on the application context, different definitions of fairness might need to be employed. As a result, new algorithms and analysis need to be proposed for each combination of clustering algorithm and fairness definition. Additionally, each new algorithm would need to be reimplemented for deployment in a real-world system. Hence, we propose an alternate approach to group-level fairness in center-based clustering inspired by research on data poisoning attacks. We seek to augment the original dataset with a small number of data points, called antidote data. When clustering is undertaken on this new dataset, the output is fair, for the chosen clustering algorithm and fairness definition. We formulate this as a general bi-level optimization problem which can accommodate any center-based clustering algorithms and fairness notions. We then categorize approaches for solving this bi-level optimization for two different problem settings. Extensive experiments on different clustering algorithms and fairness notions show that our algorithms can achieve desired levels of fairness on many real-world datasets with a very small percentage of antidote data added. We also find that our algorithms achieve lower fairness costs and competitive clustering performance compared to other state-of-the-art fair clustering algorithms.}
}

Endnote

%0 Conference Paper
%T Fair Clustering Using Antidote Data
%A Anshuman Chhabra
%A Adish Singla
%A Prasant Mohapatra
%B Proceedings of The Algorithmic Fairness through the Lens of Causality and Robustness
%C Proceedings of Machine Learning Research
%D 2022
%E Jessica Schrouff
%E Awa Dieng
%E Miriam Rateike
%E Kweku Kwegyir-Aggrey
%E Golnoosh Farnadi	
%F pmlr-v171-chhabra22a
%I PMLR
%P 19--39
%U https://proceedings.mlr.press/v171/chhabra22a.html
%V 171
%X Clustering algorithms are widely utilized for many modern data science applications. This motivates the need to make outputs of clustering algorithms fair. Traditionally, new fair algorithmic variants to clustering algorithms are developed for specific notions of fairness. However, depending on the application context, different definitions of fairness might need to be employed. As a result, new algorithms and analysis need to be proposed for each combination of clustering algorithm and fairness definition. Additionally, each new algorithm would need to be reimplemented for deployment in a real-world system. Hence, we propose an alternate approach to group-level fairness in center-based clustering inspired by research on data poisoning attacks. We seek to augment the original dataset with a small number of data points, called antidote data. When clustering is undertaken on this new dataset, the output is fair, for the chosen clustering algorithm and fairness definition. We formulate this as a general bi-level optimization problem which can accommodate any center-based clustering algorithms and fairness notions. We then categorize approaches for solving this bi-level optimization for two different problem settings. Extensive experiments on different clustering algorithms and fairness notions show that our algorithms can achieve desired levels of fairness on many real-world datasets with a very small percentage of antidote data added. We also find that our algorithms achieve lower fairness costs and competitive clustering performance compared to other state-of-the-art fair clustering algorithms.

APA

Chhabra, A., Singla, A. & Mohapatra, P.. (2022). Fair Clustering Using Antidote Data. Proceedings of The Algorithmic Fairness through the Lens of Causality and Robustness, in Proceedings of Machine Learning Research 171:19-39 Available from https://proceedings.mlr.press/v171/chhabra22a.html.

Fair Clustering Using Antidote Data

Abstract

Cite this Paper

Related Material