Conformal Validity Guarantees Exist for Any Data Distribution (and How to Find Them)

Drew Prinster, Samuel Don Stanton, Anqi Liu, Suchi Saria
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:41086-41118, 2024.

Abstract

As artificial intelligence (AI) / machine learning (ML) gain widespread adoption, practitioners are increasingly seeking means to quantify and control the risk these systems incur. This challenge is especially salient when such systems have autonomy to collect their own data, such as in black-box optimization and active learning, where their actions induce sequential feedback-loop shifts in the data distribution. Conformal prediction is a promising approach to uncertainty and risk quantification, but prior variants’ validity guarantees have assumed some form of “quasi-exchangeability” on the data distribution, thereby excluding many types of sequential shifts. In this paper we prove that conformal prediction can theoretically be extended to any joint data distribution, not just exchangeable or quasi-exchangeable ones. Although the most general case is exceedingly impractical to compute, for concrete practical applications we outline a procedure for deriving specific conformal algorithms for any data distribution, and we use this procedure to derive tractable algorithms for a series of AI/ML-agent-induced covariate shifts. We evaluate the proposed algorithms empirically on synthetic black-box optimization and active learning tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-prinster24a, title = {Conformal Validity Guarantees Exist for Any Data Distribution (and How to Find Them)}, author = {Prinster, Drew and Stanton, Samuel Don and Liu, Anqi and Saria, Suchi}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {41086--41118}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/prinster24a/prinster24a.pdf}, url = {https://proceedings.mlr.press/v235/prinster24a.html}, abstract = {As artificial intelligence (AI) / machine learning (ML) gain widespread adoption, practitioners are increasingly seeking means to quantify and control the risk these systems incur. This challenge is especially salient when such systems have autonomy to collect their own data, such as in black-box optimization and active learning, where their actions induce sequential feedback-loop shifts in the data distribution. Conformal prediction is a promising approach to uncertainty and risk quantification, but prior variants’ validity guarantees have assumed some form of “quasi-exchangeability” on the data distribution, thereby excluding many types of sequential shifts. In this paper we prove that conformal prediction can theoretically be extended to any joint data distribution, not just exchangeable or quasi-exchangeable ones. Although the most general case is exceedingly impractical to compute, for concrete practical applications we outline a procedure for deriving specific conformal algorithms for any data distribution, and we use this procedure to derive tractable algorithms for a series of AI/ML-agent-induced covariate shifts. We evaluate the proposed algorithms empirically on synthetic black-box optimization and active learning tasks.} }
Endnote
%0 Conference Paper %T Conformal Validity Guarantees Exist for Any Data Distribution (and How to Find Them) %A Drew Prinster %A Samuel Don Stanton %A Anqi Liu %A Suchi Saria %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-prinster24a %I PMLR %P 41086--41118 %U https://proceedings.mlr.press/v235/prinster24a.html %V 235 %X As artificial intelligence (AI) / machine learning (ML) gain widespread adoption, practitioners are increasingly seeking means to quantify and control the risk these systems incur. This challenge is especially salient when such systems have autonomy to collect their own data, such as in black-box optimization and active learning, where their actions induce sequential feedback-loop shifts in the data distribution. Conformal prediction is a promising approach to uncertainty and risk quantification, but prior variants’ validity guarantees have assumed some form of “quasi-exchangeability” on the data distribution, thereby excluding many types of sequential shifts. In this paper we prove that conformal prediction can theoretically be extended to any joint data distribution, not just exchangeable or quasi-exchangeable ones. Although the most general case is exceedingly impractical to compute, for concrete practical applications we outline a procedure for deriving specific conformal algorithms for any data distribution, and we use this procedure to derive tractable algorithms for a series of AI/ML-agent-induced covariate shifts. We evaluate the proposed algorithms empirically on synthetic black-box optimization and active learning tasks.
APA
Prinster, D., Stanton, S.D., Liu, A. & Saria, S.. (2024). Conformal Validity Guarantees Exist for Any Data Distribution (and How to Find Them). Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:41086-41118 Available from https://proceedings.mlr.press/v235/prinster24a.html.

Related Material