Weak Proxies are Sufficient and Preferable for Fairness with Missing Sensitive Attributes

Zhaowei Zhu, Yuanshun Yao, Jiankai Sun, Hang Li, Yang Liu
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:43258-43288, 2023.

Abstract

Evaluating fairness can be challenging in practice because the sensitive attributes of data are often inaccessible due to privacy constraints. The go-to approach that the industry frequently adopts is using off-the-shelf proxy models to predict the missing sensitive attributes, e.g. Meta (Alao et al., 2021) and Twitter (Belli et al., 2022). Despite its popularity, there are three important questions unanswered: (1) Is directly using proxies efficacious in measuring fairness? (2) If not, is it possible to accurately evaluate fairness using proxies only? (3) Given the ethical controversy over infer-ring user private information, is it possible to only use weak (i.e. inaccurate) proxies in order to protect privacy? Our theoretical analyses show that directly using proxy models can give a false sense of (un)fairness. Second, we develop an algorithm that is able to measure fairness (provably) accurately with only three properly identified proxies. Third, we show that our algorithm allows the use of only weak proxies (e.g. with only 68.85% accuracy on COMPAS), adding an extra layer of protection on user privacy. Experiments validate our theoretical analyses and show our algorithm can effectively measure and mitigate bias. Our results imply a set of practical guidelines for prac-titioners on how to use proxies properly. Code is available at https://github.com/UCSC-REAL/fair-eval.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-zhu23n, title = {Weak Proxies are Sufficient and Preferable for Fairness with Missing Sensitive Attributes}, author = {Zhu, Zhaowei and Yao, Yuanshun and Sun, Jiankai and Li, Hang and Liu, Yang}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {43258--43288}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/zhu23n/zhu23n.pdf}, url = {https://proceedings.mlr.press/v202/zhu23n.html}, abstract = {Evaluating fairness can be challenging in practice because the sensitive attributes of data are often inaccessible due to privacy constraints. The go-to approach that the industry frequently adopts is using off-the-shelf proxy models to predict the missing sensitive attributes, e.g. Meta (Alao et al., 2021) and Twitter (Belli et al., 2022). Despite its popularity, there are three important questions unanswered: (1) Is directly using proxies efficacious in measuring fairness? (2) If not, is it possible to accurately evaluate fairness using proxies only? (3) Given the ethical controversy over infer-ring user private information, is it possible to only use weak (i.e. inaccurate) proxies in order to protect privacy? Our theoretical analyses show that directly using proxy models can give a false sense of (un)fairness. Second, we develop an algorithm that is able to measure fairness (provably) accurately with only three properly identified proxies. Third, we show that our algorithm allows the use of only weak proxies (e.g. with only 68.85% accuracy on COMPAS), adding an extra layer of protection on user privacy. Experiments validate our theoretical analyses and show our algorithm can effectively measure and mitigate bias. Our results imply a set of practical guidelines for prac-titioners on how to use proxies properly. Code is available at https://github.com/UCSC-REAL/fair-eval.} }
Endnote
%0 Conference Paper %T Weak Proxies are Sufficient and Preferable for Fairness with Missing Sensitive Attributes %A Zhaowei Zhu %A Yuanshun Yao %A Jiankai Sun %A Hang Li %A Yang Liu %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-zhu23n %I PMLR %P 43258--43288 %U https://proceedings.mlr.press/v202/zhu23n.html %V 202 %X Evaluating fairness can be challenging in practice because the sensitive attributes of data are often inaccessible due to privacy constraints. The go-to approach that the industry frequently adopts is using off-the-shelf proxy models to predict the missing sensitive attributes, e.g. Meta (Alao et al., 2021) and Twitter (Belli et al., 2022). Despite its popularity, there are three important questions unanswered: (1) Is directly using proxies efficacious in measuring fairness? (2) If not, is it possible to accurately evaluate fairness using proxies only? (3) Given the ethical controversy over infer-ring user private information, is it possible to only use weak (i.e. inaccurate) proxies in order to protect privacy? Our theoretical analyses show that directly using proxy models can give a false sense of (un)fairness. Second, we develop an algorithm that is able to measure fairness (provably) accurately with only three properly identified proxies. Third, we show that our algorithm allows the use of only weak proxies (e.g. with only 68.85% accuracy on COMPAS), adding an extra layer of protection on user privacy. Experiments validate our theoretical analyses and show our algorithm can effectively measure and mitigate bias. Our results imply a set of practical guidelines for prac-titioners on how to use proxies properly. Code is available at https://github.com/UCSC-REAL/fair-eval.
APA
Zhu, Z., Yao, Y., Sun, J., Li, H. & Liu, Y.. (2023). Weak Proxies are Sufficient and Preferable for Fairness with Missing Sensitive Attributes. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:43258-43288 Available from https://proceedings.mlr.press/v202/zhu23n.html.

Related Material