Fair Classification with Partial Feedback: An Exploration-Based Data Collection Approach

Vijay Keswani, Anay Mehrotra, L. Elisa Celis
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:23547-23576, 2024.

Abstract

In many predictive contexts (e.g., credit lending), true outcomes are only observed for samples that were positively classified in the past. These past observations, in turn, form training datasets for classifiers that make future predictions. However, such training datasets lack information about the outcomes of samples that were (incorrectly) negatively classified in the past and can lead to erroneous classifiers. We present an approach that trains a classifier using available data and comes with a family of exploration strategies to collect outcome data about subpopulations that otherwise would have been ignored. For any exploration strategy, the approach comes with guarantees that (1) all sub-populations are explored, (2) the fraction of false positives is bounded, and (3) the trained classifier converges to a "desired" classifier. The right exploration strategy is context-dependent; it can be chosen to improve learning guarantees and encode context-specific group fairness properties. Evaluation on real-world datasets shows that this approach consistently boosts the quality of collected outcome data and improves the fraction of true positives for all groups, with only a small reduction in predictive utility.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-keswani24a, title = {Fair Classification with Partial Feedback: An Exploration-Based Data Collection Approach}, author = {Keswani, Vijay and Mehrotra, Anay and Celis, L. Elisa}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {23547--23576}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/keswani24a/keswani24a.pdf}, url = {https://proceedings.mlr.press/v235/keswani24a.html}, abstract = {In many predictive contexts (e.g., credit lending), true outcomes are only observed for samples that were positively classified in the past. These past observations, in turn, form training datasets for classifiers that make future predictions. However, such training datasets lack information about the outcomes of samples that were (incorrectly) negatively classified in the past and can lead to erroneous classifiers. We present an approach that trains a classifier using available data and comes with a family of exploration strategies to collect outcome data about subpopulations that otherwise would have been ignored. For any exploration strategy, the approach comes with guarantees that (1) all sub-populations are explored, (2) the fraction of false positives is bounded, and (3) the trained classifier converges to a "desired" classifier. The right exploration strategy is context-dependent; it can be chosen to improve learning guarantees and encode context-specific group fairness properties. Evaluation on real-world datasets shows that this approach consistently boosts the quality of collected outcome data and improves the fraction of true positives for all groups, with only a small reduction in predictive utility.} }
Endnote
%0 Conference Paper %T Fair Classification with Partial Feedback: An Exploration-Based Data Collection Approach %A Vijay Keswani %A Anay Mehrotra %A L. Elisa Celis %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-keswani24a %I PMLR %P 23547--23576 %U https://proceedings.mlr.press/v235/keswani24a.html %V 235 %X In many predictive contexts (e.g., credit lending), true outcomes are only observed for samples that were positively classified in the past. These past observations, in turn, form training datasets for classifiers that make future predictions. However, such training datasets lack information about the outcomes of samples that were (incorrectly) negatively classified in the past and can lead to erroneous classifiers. We present an approach that trains a classifier using available data and comes with a family of exploration strategies to collect outcome data about subpopulations that otherwise would have been ignored. For any exploration strategy, the approach comes with guarantees that (1) all sub-populations are explored, (2) the fraction of false positives is bounded, and (3) the trained classifier converges to a "desired" classifier. The right exploration strategy is context-dependent; it can be chosen to improve learning guarantees and encode context-specific group fairness properties. Evaluation on real-world datasets shows that this approach consistently boosts the quality of collected outcome data and improves the fraction of true positives for all groups, with only a small reduction in predictive utility.
APA
Keswani, V., Mehrotra, A. & Celis, L.E.. (2024). Fair Classification with Partial Feedback: An Exploration-Based Data Collection Approach. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:23547-23576 Available from https://proceedings.mlr.press/v235/keswani24a.html.

Related Material