Fair Densities via Boosting the Sufficient Statistics of Exponential Families

Alexander Soen, Hisham Husain, Richard Nock
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:32105-32144, 2023.

Abstract

We introduce a boosting algorithm to pre-process data for fairness. Starting from an initial fair but inaccurate distribution, our approach shifts towards better data fitting while still ensuring a minimal fairness guarantee. To do so, it learns the sufficient statistics of an exponential family with boosting-compliant convergence. Importantly, we are able to theoretically prove that the learned distribution will have a representation rate and statistical rate data fairness guarantee. Unlike recent optimization based pre-processing methods, our approach can be easily adapted for continuous domain features. Furthermore, when the weak learners are specified to be decision trees, the sufficient statistics of the learned distribution can be examined to provide clues on sources of (un)fairness. Empirical results are present to display the quality of result on real-world data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-soen23a, title = {Fair Densities via Boosting the Sufficient Statistics of Exponential Families}, author = {Soen, Alexander and Husain, Hisham and Nock, Richard}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {32105--32144}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/soen23a/soen23a.pdf}, url = {https://proceedings.mlr.press/v202/soen23a.html}, abstract = {We introduce a boosting algorithm to pre-process data for fairness. Starting from an initial fair but inaccurate distribution, our approach shifts towards better data fitting while still ensuring a minimal fairness guarantee. To do so, it learns the sufficient statistics of an exponential family with boosting-compliant convergence. Importantly, we are able to theoretically prove that the learned distribution will have a representation rate and statistical rate data fairness guarantee. Unlike recent optimization based pre-processing methods, our approach can be easily adapted for continuous domain features. Furthermore, when the weak learners are specified to be decision trees, the sufficient statistics of the learned distribution can be examined to provide clues on sources of (un)fairness. Empirical results are present to display the quality of result on real-world data.} }
Endnote
%0 Conference Paper %T Fair Densities via Boosting the Sufficient Statistics of Exponential Families %A Alexander Soen %A Hisham Husain %A Richard Nock %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-soen23a %I PMLR %P 32105--32144 %U https://proceedings.mlr.press/v202/soen23a.html %V 202 %X We introduce a boosting algorithm to pre-process data for fairness. Starting from an initial fair but inaccurate distribution, our approach shifts towards better data fitting while still ensuring a minimal fairness guarantee. To do so, it learns the sufficient statistics of an exponential family with boosting-compliant convergence. Importantly, we are able to theoretically prove that the learned distribution will have a representation rate and statistical rate data fairness guarantee. Unlike recent optimization based pre-processing methods, our approach can be easily adapted for continuous domain features. Furthermore, when the weak learners are specified to be decision trees, the sufficient statistics of the learned distribution can be examined to provide clues on sources of (un)fairness. Empirical results are present to display the quality of result on real-world data.
APA
Soen, A., Husain, H. & Nock, R.. (2023). Fair Densities via Boosting the Sufficient Statistics of Exponential Families. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:32105-32144 Available from https://proceedings.mlr.press/v202/soen23a.html.

Related Material