On Coresets for Fair Regression and Individually Fair Clustering

Rachit Chhaya, Anirban Dasgupta, Jayesh Choudhari, Supratim Shit
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:9603-9625, 2022.

Abstract

In this paper we present coresets for Fair Regression with Statistical Parity (SP) constraints and for Individually Fair Clustering. Due to the fairness constraints, the classical coreset definition is not enough for these problems. We first define coresets for both the problems. We show that to obtain such coresets, it is sufficient to sample points based on the probabilities dependent on combination of sensitivity score and a carefully chosen term according to the fairness constraints. We give provable guarantees with relative error in preserving the cost and a small additive error in preserving fairness constraints for both problems. Since our coresets are much smaller in size as compared to $n$, the number of points, they can give huge benefits in computational costs (from polynomial to polylogarithmic in $n$), especially when $n \gg d$, where $d$ is the input dimension. We support our theoretical claims with experimental evaluations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v151-chhaya22a, title = { On Coresets for Fair Regression and Individually Fair Clustering }, author = {Chhaya, Rachit and Dasgupta, Anirban and Choudhari, Jayesh and Shit, Supratim}, booktitle = {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics}, pages = {9603--9625}, year = {2022}, editor = {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel}, volume = {151}, series = {Proceedings of Machine Learning Research}, month = {28--30 Mar}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v151/chhaya22a/chhaya22a.pdf}, url = {https://proceedings.mlr.press/v151/chhaya22a.html}, abstract = { In this paper we present coresets for Fair Regression with Statistical Parity (SP) constraints and for Individually Fair Clustering. Due to the fairness constraints, the classical coreset definition is not enough for these problems. We first define coresets for both the problems. We show that to obtain such coresets, it is sufficient to sample points based on the probabilities dependent on combination of sensitivity score and a carefully chosen term according to the fairness constraints. We give provable guarantees with relative error in preserving the cost and a small additive error in preserving fairness constraints for both problems. Since our coresets are much smaller in size as compared to $n$, the number of points, they can give huge benefits in computational costs (from polynomial to polylogarithmic in $n$), especially when $n \gg d$, where $d$ is the input dimension. We support our theoretical claims with experimental evaluations. } }
Endnote
%0 Conference Paper %T On Coresets for Fair Regression and Individually Fair Clustering %A Rachit Chhaya %A Anirban Dasgupta %A Jayesh Choudhari %A Supratim Shit %B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2022 %E Gustau Camps-Valls %E Francisco J. R. Ruiz %E Isabel Valera %F pmlr-v151-chhaya22a %I PMLR %P 9603--9625 %U https://proceedings.mlr.press/v151/chhaya22a.html %V 151 %X In this paper we present coresets for Fair Regression with Statistical Parity (SP) constraints and for Individually Fair Clustering. Due to the fairness constraints, the classical coreset definition is not enough for these problems. We first define coresets for both the problems. We show that to obtain such coresets, it is sufficient to sample points based on the probabilities dependent on combination of sensitivity score and a carefully chosen term according to the fairness constraints. We give provable guarantees with relative error in preserving the cost and a small additive error in preserving fairness constraints for both problems. Since our coresets are much smaller in size as compared to $n$, the number of points, they can give huge benefits in computational costs (from polynomial to polylogarithmic in $n$), especially when $n \gg d$, where $d$ is the input dimension. We support our theoretical claims with experimental evaluations.
APA
Chhaya, R., Dasgupta, A., Choudhari, J. & Shit, S.. (2022). On Coresets for Fair Regression and Individually Fair Clustering . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:9603-9625 Available from https://proceedings.mlr.press/v151/chhaya22a.html.

Related Material