WATCH: Adaptive Monitoring for AI Deployments via Weighted-Conformal Martingales

Drew Prinster, Xing Han, Anqi Liu, Suchi Saria
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:49830-49859, 2025.

Abstract

Responsibly deploying artificial intelligence (AI) / machine learning (ML) systems in high-stakes settings arguably requires not only proof of system reliability, but also continual, post-deployment monitoring to quickly detect and address any unsafe behavior. Methods for nonparametric sequential testing—especially conformal test martingales (CTMs) and anytime-valid inference—offer promising tools for this monitoring task. However, existing approaches are restricted to monitoring limited hypothesis classes or “alarm criteria” (e.g., detecting data shifts that violate certain exchangeability or IID assumptions), do not allow for online adaptation in response to shifts, and/or cannot diagnose the cause of degradation or alarm. In this paper, we address these limitations by proposing a weighted generalization of conformal test martingales (WCTMs), which lay a theoretical foundation for online monitoring for any unexpected changepoints in the data distribution while controlling false-alarms. For practical applications, we propose specific WCTM algorithms that adapt online to mild covariate shifts (in the marginal input distribution), quickly detect harmful shifts, and diagnose those harmful shifts as concept shifts (in the conditional label distribution) or extreme (out-of-support) covariate shifts that cannot be easily adapted to. On real-world datasets, we demonstrate improved performance relative to state-of-the-art baselines.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-prinster25a, title = {{WATCH}: Adaptive Monitoring for {AI} Deployments via Weighted-Conformal Martingales}, author = {Prinster, Drew and Han, Xing and Liu, Anqi and Saria, Suchi}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {49830--49859}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/prinster25a/prinster25a.pdf}, url = {https://proceedings.mlr.press/v267/prinster25a.html}, abstract = {Responsibly deploying artificial intelligence (AI) / machine learning (ML) systems in high-stakes settings arguably requires not only proof of system reliability, but also continual, post-deployment monitoring to quickly detect and address any unsafe behavior. Methods for nonparametric sequential testing—especially conformal test martingales (CTMs) and anytime-valid inference—offer promising tools for this monitoring task. However, existing approaches are restricted to monitoring limited hypothesis classes or “alarm criteria” (e.g., detecting data shifts that violate certain exchangeability or IID assumptions), do not allow for online adaptation in response to shifts, and/or cannot diagnose the cause of degradation or alarm. In this paper, we address these limitations by proposing a weighted generalization of conformal test martingales (WCTMs), which lay a theoretical foundation for online monitoring for any unexpected changepoints in the data distribution while controlling false-alarms. For practical applications, we propose specific WCTM algorithms that adapt online to mild covariate shifts (in the marginal input distribution), quickly detect harmful shifts, and diagnose those harmful shifts as concept shifts (in the conditional label distribution) or extreme (out-of-support) covariate shifts that cannot be easily adapted to. On real-world datasets, we demonstrate improved performance relative to state-of-the-art baselines.} }
Endnote
%0 Conference Paper %T WATCH: Adaptive Monitoring for AI Deployments via Weighted-Conformal Martingales %A Drew Prinster %A Xing Han %A Anqi Liu %A Suchi Saria %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-prinster25a %I PMLR %P 49830--49859 %U https://proceedings.mlr.press/v267/prinster25a.html %V 267 %X Responsibly deploying artificial intelligence (AI) / machine learning (ML) systems in high-stakes settings arguably requires not only proof of system reliability, but also continual, post-deployment monitoring to quickly detect and address any unsafe behavior. Methods for nonparametric sequential testing—especially conformal test martingales (CTMs) and anytime-valid inference—offer promising tools for this monitoring task. However, existing approaches are restricted to monitoring limited hypothesis classes or “alarm criteria” (e.g., detecting data shifts that violate certain exchangeability or IID assumptions), do not allow for online adaptation in response to shifts, and/or cannot diagnose the cause of degradation or alarm. In this paper, we address these limitations by proposing a weighted generalization of conformal test martingales (WCTMs), which lay a theoretical foundation for online monitoring for any unexpected changepoints in the data distribution while controlling false-alarms. For practical applications, we propose specific WCTM algorithms that adapt online to mild covariate shifts (in the marginal input distribution), quickly detect harmful shifts, and diagnose those harmful shifts as concept shifts (in the conditional label distribution) or extreme (out-of-support) covariate shifts that cannot be easily adapted to. On real-world datasets, we demonstrate improved performance relative to state-of-the-art baselines.
APA
Prinster, D., Han, X., Liu, A. & Saria, S.. (2025). WATCH: Adaptive Monitoring for AI Deployments via Weighted-Conformal Martingales. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:49830-49859 Available from https://proceedings.mlr.press/v267/prinster25a.html.

Related Material