Distribution-free risk assessment of regression-based machine learning algorithms

Sukrita Singh, Neeraj Sarna, Yuanyuan Li, Yang Lin, Agni Orfanoudaki, Michael Berger
Proceedings of the Thirteenth Symposium on Conformal and Probabilistic Prediction with Applications, PMLR 230:44-64, 2024.

Abstract

In safety-critical applications, such as medicine and healthcare, decision makers are hesitant to deploy machine learning models unless the expected algorithmic errors are guaranteed to remain within pre-defined tolerances. However, since ML algorithms are statistical in nature, a bounded error cannot be ensured for all possible data inputs. To the contrary, practitioners could be provided with an estimate of the probability the error exceeds the pre-defined tolerance interval. Thus, they will be able to better anticipate high magnitude ML errors and thus manage them more effectively. We refer to this as the risk-assessment problem and propose a novel solution for it. We propose a conformal prediction approach that translates the risk-assessment task into a prediction interval generation problem. The conformal prediction approach results in prediction intervals that are guaranteed to contain the true target variable with a given probability. Using this coverage property, we prove that our risk-assessment approach is conservative i.e., the risk we compute, under weak assumptions, is not lower than the true risk resulting from the ML algorithm. We focus on regression tasks and computationally study, and compare with other related methods, the performance of the proposed method both with and without covariate shift. We find that our method offers superior accuracy while being conservative.

Cite this Paper


BibTeX
@InProceedings{pmlr-v230-singh24a, title = {Distribution-free risk assessment of regression-based machine learning algorithms}, author = {Singh, Sukrita and Sarna, Neeraj and Li, Yuanyuan and Lin, Yang and Orfanoudaki, Agni and Berger, Michael}, booktitle = {Proceedings of the Thirteenth Symposium on Conformal and Probabilistic Prediction with Applications}, pages = {44--64}, year = {2024}, editor = {Vantini, Simone and Fontana, Matteo and Solari, Aldo and Boström, Henrik and Carlsson, Lars}, volume = {230}, series = {Proceedings of Machine Learning Research}, month = {09--11 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v230/main/assets/singh24a/singh24a.pdf}, url = {https://proceedings.mlr.press/v230/singh24a.html}, abstract = {In safety-critical applications, such as medicine and healthcare, decision makers are hesitant to deploy machine learning models unless the expected algorithmic errors are guaranteed to remain within pre-defined tolerances. However, since ML algorithms are statistical in nature, a bounded error cannot be ensured for all possible data inputs. To the contrary, practitioners could be provided with an estimate of the probability the error exceeds the pre-defined tolerance interval. Thus, they will be able to better anticipate high magnitude ML errors and thus manage them more effectively. We refer to this as the risk-assessment problem and propose a novel solution for it. We propose a conformal prediction approach that translates the risk-assessment task into a prediction interval generation problem. The conformal prediction approach results in prediction intervals that are guaranteed to contain the true target variable with a given probability. Using this coverage property, we prove that our risk-assessment approach is conservative i.e., the risk we compute, under weak assumptions, is not lower than the true risk resulting from the ML algorithm. We focus on regression tasks and computationally study, and compare with other related methods, the performance of the proposed method both with and without covariate shift. We find that our method offers superior accuracy while being conservative.} }
Endnote
%0 Conference Paper %T Distribution-free risk assessment of regression-based machine learning algorithms %A Sukrita Singh %A Neeraj Sarna %A Yuanyuan Li %A Yang Lin %A Agni Orfanoudaki %A Michael Berger %B Proceedings of the Thirteenth Symposium on Conformal and Probabilistic Prediction with Applications %C Proceedings of Machine Learning Research %D 2024 %E Simone Vantini %E Matteo Fontana %E Aldo Solari %E Henrik Boström %E Lars Carlsson %F pmlr-v230-singh24a %I PMLR %P 44--64 %U https://proceedings.mlr.press/v230/singh24a.html %V 230 %X In safety-critical applications, such as medicine and healthcare, decision makers are hesitant to deploy machine learning models unless the expected algorithmic errors are guaranteed to remain within pre-defined tolerances. However, since ML algorithms are statistical in nature, a bounded error cannot be ensured for all possible data inputs. To the contrary, practitioners could be provided with an estimate of the probability the error exceeds the pre-defined tolerance interval. Thus, they will be able to better anticipate high magnitude ML errors and thus manage them more effectively. We refer to this as the risk-assessment problem and propose a novel solution for it. We propose a conformal prediction approach that translates the risk-assessment task into a prediction interval generation problem. The conformal prediction approach results in prediction intervals that are guaranteed to contain the true target variable with a given probability. Using this coverage property, we prove that our risk-assessment approach is conservative i.e., the risk we compute, under weak assumptions, is not lower than the true risk resulting from the ML algorithm. We focus on regression tasks and computationally study, and compare with other related methods, the performance of the proposed method both with and without covariate shift. We find that our method offers superior accuracy while being conservative.
APA
Singh, S., Sarna, N., Li, Y., Lin, Y., Orfanoudaki, A. & Berger, M.. (2024). Distribution-free risk assessment of regression-based machine learning algorithms. Proceedings of the Thirteenth Symposium on Conformal and Probabilistic Prediction with Applications, in Proceedings of Machine Learning Research 230:44-64 Available from https://proceedings.mlr.press/v230/singh24a.html.

Related Material