Learning with Selectively Labeled Data from Multiple Decision-makers

Jian Chen, Zhehao Li, Xiaojie Mao
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:8480-8519, 2025.

Abstract

We study the problem of classification with selectively labeled data, whose distribution may differ from the full population due to historical decision-making. We exploit the fact that in many applications historical decisions were made by multiple decision-makers, each with different decision rules. We analyze this setup under a principled instrumental variable (IV) framework and rigorously study the identification of classification risk. We establish conditions for the exact identification of classification risk and derive tight partial identification bounds when exact identification fails. We further propose a unified cost-sensitive learning (UCL) approach to learn classifiers robust to selection bias in both identification settings. Finally, we theoretically and numerically validate the efficacy of our proposed method.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-chen25al, title = {Learning with Selectively Labeled Data from Multiple Decision-makers}, author = {Chen, Jian and Li, Zhehao and Mao, Xiaojie}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {8480--8519}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/chen25al/chen25al.pdf}, url = {https://proceedings.mlr.press/v267/chen25al.html}, abstract = {We study the problem of classification with selectively labeled data, whose distribution may differ from the full population due to historical decision-making. We exploit the fact that in many applications historical decisions were made by multiple decision-makers, each with different decision rules. We analyze this setup under a principled instrumental variable (IV) framework and rigorously study the identification of classification risk. We establish conditions for the exact identification of classification risk and derive tight partial identification bounds when exact identification fails. We further propose a unified cost-sensitive learning (UCL) approach to learn classifiers robust to selection bias in both identification settings. Finally, we theoretically and numerically validate the efficacy of our proposed method.} }
Endnote
%0 Conference Paper %T Learning with Selectively Labeled Data from Multiple Decision-makers %A Jian Chen %A Zhehao Li %A Xiaojie Mao %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-chen25al %I PMLR %P 8480--8519 %U https://proceedings.mlr.press/v267/chen25al.html %V 267 %X We study the problem of classification with selectively labeled data, whose distribution may differ from the full population due to historical decision-making. We exploit the fact that in many applications historical decisions were made by multiple decision-makers, each with different decision rules. We analyze this setup under a principled instrumental variable (IV) framework and rigorously study the identification of classification risk. We establish conditions for the exact identification of classification risk and derive tight partial identification bounds when exact identification fails. We further propose a unified cost-sensitive learning (UCL) approach to learn classifiers robust to selection bias in both identification settings. Finally, we theoretically and numerically validate the efficacy of our proposed method.
APA
Chen, J., Li, Z. & Mao, X.. (2025). Learning with Selectively Labeled Data from Multiple Decision-makers. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:8480-8519 Available from https://proceedings.mlr.press/v267/chen25al.html.

Related Material