Imposing Fairness Constraints in Synthetic Data Generation

Mahed Abroshan, Andrew Elliott, Mohammad Mahdi Khalili
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:2269-2277, 2024.

Abstract

In several real-world applications (e.g., online advertising, item recommendations, etc.) it may not be possible to release and share the real dataset due to privacy concerns. As a result, synthetic data generation (SDG) has emerged as a promising solution for data sharing. While the main goal of private SDG is to create a dataset that preserves the privacy of individuals contributing to the dataset, the use of synthetic data also creates an opportunity to improve fairness. Since there often exist historical biases in the datasets, using the original real data for training can lead to an unfair model. Using synthetic data, we can attempt to remove such biases from the dataset before releasing the data. In this work, we formalize the definition of fairness in synthetic data generation and provide a general framework to achieve fairness. Then we consider two notions of counterfactual fairness and information filtering fairness and show how our framework can be used for these definitions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-abroshan24a, title = { Imposing Fairness Constraints in Synthetic Data Generation }, author = {Abroshan, Mahed and Elliott, Andrew and Mahdi Khalili, Mohammad}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {2269--2277}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/abroshan24a/abroshan24a.pdf}, url = {https://proceedings.mlr.press/v238/abroshan24a.html}, abstract = { In several real-world applications (e.g., online advertising, item recommendations, etc.) it may not be possible to release and share the real dataset due to privacy concerns. As a result, synthetic data generation (SDG) has emerged as a promising solution for data sharing. While the main goal of private SDG is to create a dataset that preserves the privacy of individuals contributing to the dataset, the use of synthetic data also creates an opportunity to improve fairness. Since there often exist historical biases in the datasets, using the original real data for training can lead to an unfair model. Using synthetic data, we can attempt to remove such biases from the dataset before releasing the data. In this work, we formalize the definition of fairness in synthetic data generation and provide a general framework to achieve fairness. Then we consider two notions of counterfactual fairness and information filtering fairness and show how our framework can be used for these definitions. } }
Endnote
%0 Conference Paper %T Imposing Fairness Constraints in Synthetic Data Generation %A Mahed Abroshan %A Andrew Elliott %A Mohammad Mahdi Khalili %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-abroshan24a %I PMLR %P 2269--2277 %U https://proceedings.mlr.press/v238/abroshan24a.html %V 238 %X In several real-world applications (e.g., online advertising, item recommendations, etc.) it may not be possible to release and share the real dataset due to privacy concerns. As a result, synthetic data generation (SDG) has emerged as a promising solution for data sharing. While the main goal of private SDG is to create a dataset that preserves the privacy of individuals contributing to the dataset, the use of synthetic data also creates an opportunity to improve fairness. Since there often exist historical biases in the datasets, using the original real data for training can lead to an unfair model. Using synthetic data, we can attempt to remove such biases from the dataset before releasing the data. In this work, we formalize the definition of fairness in synthetic data generation and provide a general framework to achieve fairness. Then we consider two notions of counterfactual fairness and information filtering fairness and show how our framework can be used for these definitions.
APA
Abroshan, M., Elliott, A. & Mahdi Khalili, M.. (2024). Imposing Fairness Constraints in Synthetic Data Generation . Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:2269-2277 Available from https://proceedings.mlr.press/v238/abroshan24a.html.

Related Material