Split Conformal Prediction under Data Contamination

Jason Clarkson, Wenkai Xu, Mihai Cucuringu, Gesine Reinert
Proceedings of the Thirteenth Symposium on Conformal and Probabilistic Prediction with Applications, PMLR 230:5-27, 2024.

Abstract

Conformal prediction is a non-parametric technique for constructing prediction intervals or sets from arbitrary predictive models under the assumption that the data is exchangeable. It is popular as it comes with theoretical guarantees on the marginal coverage of the prediction sets and the split conformal prediction variant has a very low computational cost compared to model training. We study the robustness of split conformal prediction in a data contamination setting, where we assume a small fraction of the calibration scores are drawn from a different distribution than the bulk. We quantify the impact of the corrupted data on the coverage and efficiency of the constructed sets when evaluated on clean test points, and verify our results with numerical experiments. Moreover, we propose an adjustment in the classification setting which we call Contamination Robust Conformal Prediction, and verify the efficacy of our approach using both synthetic and real datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v230-clarkson24a, title = {Split Conformal Prediction under Data Contamination}, author = {Clarkson, Jason and Xu, Wenkai and Cucuringu, Mihai and Reinert, Gesine}, booktitle = {Proceedings of the Thirteenth Symposium on Conformal and Probabilistic Prediction with Applications}, pages = {5--27}, year = {2024}, editor = {Vantini, Simone and Fontana, Matteo and Solari, Aldo and Boström, Henrik and Carlsson, Lars}, volume = {230}, series = {Proceedings of Machine Learning Research}, month = {09--11 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v230/main/assets/clarkson24a/clarkson24a.pdf}, url = {https://proceedings.mlr.press/v230/clarkson24a.html}, abstract = {Conformal prediction is a non-parametric technique for constructing prediction intervals or sets from arbitrary predictive models under the assumption that the data is exchangeable. It is popular as it comes with theoretical guarantees on the marginal coverage of the prediction sets and the split conformal prediction variant has a very low computational cost compared to model training. We study the robustness of split conformal prediction in a data contamination setting, where we assume a small fraction of the calibration scores are drawn from a different distribution than the bulk. We quantify the impact of the corrupted data on the coverage and efficiency of the constructed sets when evaluated on clean test points, and verify our results with numerical experiments. Moreover, we propose an adjustment in the classification setting which we call Contamination Robust Conformal Prediction, and verify the efficacy of our approach using both synthetic and real datasets.} }
Endnote
%0 Conference Paper %T Split Conformal Prediction under Data Contamination %A Jason Clarkson %A Wenkai Xu %A Mihai Cucuringu %A Gesine Reinert %B Proceedings of the Thirteenth Symposium on Conformal and Probabilistic Prediction with Applications %C Proceedings of Machine Learning Research %D 2024 %E Simone Vantini %E Matteo Fontana %E Aldo Solari %E Henrik Boström %E Lars Carlsson %F pmlr-v230-clarkson24a %I PMLR %P 5--27 %U https://proceedings.mlr.press/v230/clarkson24a.html %V 230 %X Conformal prediction is a non-parametric technique for constructing prediction intervals or sets from arbitrary predictive models under the assumption that the data is exchangeable. It is popular as it comes with theoretical guarantees on the marginal coverage of the prediction sets and the split conformal prediction variant has a very low computational cost compared to model training. We study the robustness of split conformal prediction in a data contamination setting, where we assume a small fraction of the calibration scores are drawn from a different distribution than the bulk. We quantify the impact of the corrupted data on the coverage and efficiency of the constructed sets when evaluated on clean test points, and verify our results with numerical experiments. Moreover, we propose an adjustment in the classification setting which we call Contamination Robust Conformal Prediction, and verify the efficacy of our approach using both synthetic and real datasets.
APA
Clarkson, J., Xu, W., Cucuringu, M. & Reinert, G.. (2024). Split Conformal Prediction under Data Contamination. Proceedings of the Thirteenth Symposium on Conformal and Probabilistic Prediction with Applications, in Proceedings of Machine Learning Research 230:5-27 Available from https://proceedings.mlr.press/v230/clarkson24a.html.

Related Material