A Causal Framework for Evaluating Deferring Systems

Filippo Palomba, Andrea Pugnana, Jose Manuel Alvarez, Salvatore Ruggieri
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:2143-2151, 2025.

Abstract

Deferring systems extend supervised Machine Learning (ML) models with the possibility to defer predictions to human experts. However, evaluating the impact of a deferring strategy on system accuracy is still an overlooked area. This paper fills this gap by evaluating deferring systems through a causal lens. We link the potential outcomes framework for causal inference with deferring systems, which allows to identify the causal impact of the deferring strategy on predictive accuracy. We distinguish two scenarios. In the first one, we have access to both the human and ML model predictions for the deferred instances. Here, we can identify the individual causal effects for deferred instances and the aggregates of them. In the second one, only human predictions are available for the deferred instances. Here, we can resort to regression discontinuity design to estimate a local causal effect. We evaluate our approach on synthetic and real datasets for seven deferring systems from the literature.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-palomba25a, title = {A Causal Framework for Evaluating Deferring Systems}, author = {Palomba, Filippo and Pugnana, Andrea and Alvarez, Jose Manuel and Ruggieri, Salvatore}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {2143--2151}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/palomba25a/palomba25a.pdf}, url = {https://proceedings.mlr.press/v258/palomba25a.html}, abstract = {Deferring systems extend supervised Machine Learning (ML) models with the possibility to defer predictions to human experts. However, evaluating the impact of a deferring strategy on system accuracy is still an overlooked area. This paper fills this gap by evaluating deferring systems through a causal lens. We link the potential outcomes framework for causal inference with deferring systems, which allows to identify the causal impact of the deferring strategy on predictive accuracy. We distinguish two scenarios. In the first one, we have access to both the human and ML model predictions for the deferred instances. Here, we can identify the individual causal effects for deferred instances and the aggregates of them. In the second one, only human predictions are available for the deferred instances. Here, we can resort to regression discontinuity design to estimate a local causal effect. We evaluate our approach on synthetic and real datasets for seven deferring systems from the literature.} }
Endnote
%0 Conference Paper %T A Causal Framework for Evaluating Deferring Systems %A Filippo Palomba %A Andrea Pugnana %A Jose Manuel Alvarez %A Salvatore Ruggieri %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-palomba25a %I PMLR %P 2143--2151 %U https://proceedings.mlr.press/v258/palomba25a.html %V 258 %X Deferring systems extend supervised Machine Learning (ML) models with the possibility to defer predictions to human experts. However, evaluating the impact of a deferring strategy on system accuracy is still an overlooked area. This paper fills this gap by evaluating deferring systems through a causal lens. We link the potential outcomes framework for causal inference with deferring systems, which allows to identify the causal impact of the deferring strategy on predictive accuracy. We distinguish two scenarios. In the first one, we have access to both the human and ML model predictions for the deferred instances. Here, we can identify the individual causal effects for deferred instances and the aggregates of them. In the second one, only human predictions are available for the deferred instances. Here, we can resort to regression discontinuity design to estimate a local causal effect. We evaluate our approach on synthetic and real datasets for seven deferring systems from the literature.
APA
Palomba, F., Pugnana, A., Alvarez, J.M. & Ruggieri, S.. (2025). A Causal Framework for Evaluating Deferring Systems. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:2143-2151 Available from https://proceedings.mlr.press/v258/palomba25a.html.

Related Material