[edit]
Distribution-free risk assessment of regression-based machine learning algorithms
Proceedings of the Thirteenth Symposium on Conformal and Probabilistic Prediction with Applications, PMLR 230:44-64, 2024.
Abstract
In safety-critical applications, such as medicine and healthcare, decision makers are hesitant to deploy machine learning models unless the expected algorithmic errors are guaranteed to remain within pre-defined tolerances. However, since ML algorithms are statistical in nature, a bounded error cannot be ensured for all possible data inputs. To the contrary, practitioners could be provided with an estimate of the probability the error exceeds the pre-defined tolerance interval. Thus, they will be able to better anticipate high magnitude ML errors and thus manage them more effectively. We refer to this as the risk-assessment problem and propose a novel solution for it. We propose a conformal prediction approach that translates the risk-assessment task into a prediction interval generation problem. The conformal prediction approach results in prediction intervals that are guaranteed to contain the true target variable with a given probability. Using this coverage property, we prove that our risk-assessment approach is conservative i.e., the risk we compute, under weak assumptions, is not lower than the true risk resulting from the ML algorithm. We focus on regression tasks and computationally study, and compare with other related methods, the performance of the proposed method both with and without covariate shift. We find that our method offers superior accuracy while being conservative.