Inherent Limitations of Multi-Task Fair Representations

Tosca Lechner, Shai Ben-David
Proceedings of The 1st Conference on Lifelong Learning Agents, PMLR 199:583-603, 2022.

Abstract

With the growing awareness to fairness in machine learning and the realization of the central role that data representation has in data processing tasks, there is an obvious interest in notions of fair data representations. The goal of such representations is that a model trained on data under the representation (e.g., a classifier) will be guaranteed to respect some fairness constraints, while still being expressive enough to model the task well. Such representations are useful when they can be fixed for training models on various different tasks and also when they serve as data filtering between the raw data (available to the representation designer) and potentially malicious agents that use the data under the representation to learn predictive models and make decisions. A long list of recent research papers strive to provide tools for achieving these goals. However, we prove that in most cases, such goals are inaccessible! Roughly stated, we prove that no representation can guarantee the fairness of classifiers for different tasks trained using it (while retaining the needed expressive powers). The reasons for this impossibility depend on the notion of fairness one aims to achieve. For the basic ground-truth-independent notion of Demographic (or Statistical) Parity, the obstacle is conceptual; a representation that guarantees such fairness inevitably depends on the marginal (unlabeled) distribution of the relevant instances, and in most cases that distribution changes from one task to another. For more refined notions of fairness, that depend on some ground truth classification, like Equalized Odds (requiring equality of error rates between groups), fairness cannot be guaranteed by a representation that does not take into account the task specific labeling rule with respect to which such fairness will be evaluated (even if the marginal data distribution is known a priori). Furthermore, for tasks sharing the same marginal distribution, we prove that except for trivial cases, no representation can guarantee Equalized Odds fairness for any two different tasks while enabling accurate label predictions for both.

Cite this Paper


BibTeX
@InProceedings{pmlr-v199-lechner22a, title = {Inherent Limitations of Multi-Task Fair Representations}, author = {Lechner, Tosca and Ben-David, Shai}, booktitle = {Proceedings of The 1st Conference on Lifelong Learning Agents}, pages = {583--603}, year = {2022}, editor = {Chandar, Sarath and Pascanu, Razvan and Precup, Doina}, volume = {199}, series = {Proceedings of Machine Learning Research}, month = {22--24 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v199/lechner22a/lechner22a.pdf}, url = {https://proceedings.mlr.press/v199/lechner22a.html}, abstract = {With the growing awareness to fairness in machine learning and the realization of the central role that data representation has in data processing tasks, there is an obvious interest in notions of fair data representations. The goal of such representations is that a model trained on data under the representation (e.g., a classifier) will be guaranteed to respect some fairness constraints, while still being expressive enough to model the task well. Such representations are useful when they can be fixed for training models on various different tasks and also when they serve as data filtering between the raw data (available to the representation designer) and potentially malicious agents that use the data under the representation to learn predictive models and make decisions. A long list of recent research papers strive to provide tools for achieving these goals. However, we prove that in most cases, such goals are inaccessible! Roughly stated, we prove that no representation can guarantee the fairness of classifiers for different tasks trained using it (while retaining the needed expressive powers). The reasons for this impossibility depend on the notion of fairness one aims to achieve. For the basic ground-truth-independent notion of Demographic (or Statistical) Parity, the obstacle is conceptual; a representation that guarantees such fairness inevitably depends on the marginal (unlabeled) distribution of the relevant instances, and in most cases that distribution changes from one task to another. For more refined notions of fairness, that depend on some ground truth classification, like Equalized Odds (requiring equality of error rates between groups), fairness cannot be guaranteed by a representation that does not take into account the task specific labeling rule with respect to which such fairness will be evaluated (even if the marginal data distribution is known a priori). Furthermore, for tasks sharing the same marginal distribution, we prove that except for trivial cases, no representation can guarantee Equalized Odds fairness for any two different tasks while enabling accurate label predictions for both.} }
Endnote
%0 Conference Paper %T Inherent Limitations of Multi-Task Fair Representations %A Tosca Lechner %A Shai Ben-David %B Proceedings of The 1st Conference on Lifelong Learning Agents %C Proceedings of Machine Learning Research %D 2022 %E Sarath Chandar %E Razvan Pascanu %E Doina Precup %F pmlr-v199-lechner22a %I PMLR %P 583--603 %U https://proceedings.mlr.press/v199/lechner22a.html %V 199 %X With the growing awareness to fairness in machine learning and the realization of the central role that data representation has in data processing tasks, there is an obvious interest in notions of fair data representations. The goal of such representations is that a model trained on data under the representation (e.g., a classifier) will be guaranteed to respect some fairness constraints, while still being expressive enough to model the task well. Such representations are useful when they can be fixed for training models on various different tasks and also when they serve as data filtering between the raw data (available to the representation designer) and potentially malicious agents that use the data under the representation to learn predictive models and make decisions. A long list of recent research papers strive to provide tools for achieving these goals. However, we prove that in most cases, such goals are inaccessible! Roughly stated, we prove that no representation can guarantee the fairness of classifiers for different tasks trained using it (while retaining the needed expressive powers). The reasons for this impossibility depend on the notion of fairness one aims to achieve. For the basic ground-truth-independent notion of Demographic (or Statistical) Parity, the obstacle is conceptual; a representation that guarantees such fairness inevitably depends on the marginal (unlabeled) distribution of the relevant instances, and in most cases that distribution changes from one task to another. For more refined notions of fairness, that depend on some ground truth classification, like Equalized Odds (requiring equality of error rates between groups), fairness cannot be guaranteed by a representation that does not take into account the task specific labeling rule with respect to which such fairness will be evaluated (even if the marginal data distribution is known a priori). Furthermore, for tasks sharing the same marginal distribution, we prove that except for trivial cases, no representation can guarantee Equalized Odds fairness for any two different tasks while enabling accurate label predictions for both.
APA
Lechner, T. & Ben-David, S.. (2022). Inherent Limitations of Multi-Task Fair Representations. Proceedings of The 1st Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 199:583-603 Available from https://proceedings.mlr.press/v199/lechner22a.html.

Related Material