[edit]
On Coresets for Fair Regression and Individually Fair Clustering
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:9603-9625, 2022.
Abstract
In this paper we present coresets for Fair Regression with Statistical Parity (SP) constraints and for Individually Fair Clustering. Due to the fairness constraints, the classical coreset definition is not enough for these problems. We first define coresets for both the problems. We show that to obtain such coresets, it is sufficient to sample points based on the probabilities dependent on combination of sensitivity score and a carefully chosen term according to the fairness constraints. We give provable guarantees with relative error in preserving the cost and a small additive error in preserving fairness constraints for both problems. Since our coresets are much smaller in size as compared to $n$, the number of points, they can give huge benefits in computational costs (from polynomial to polylogarithmic in $n$), especially when $n \gg d$, where $d$ is the input dimension. We support our theoretical claims with experimental evaluations.