DeconDTN-Toolkit: A Library for Evaluation and Enhancement of Robustness to Provenance Shift

Yongsen Tan; Zhecheng Sheng; Xiruo Ding; Serguei V S Pakhomov; Trevor Cohen

DeconDTN-Toolkit: A Library for Evaluation and Enhancement of Robustness to Provenance Shift

Yongsen Tan, Zhecheng Sheng, Xiruo Ding, Serguei V S Pakhomov, Trevor Cohen

Proceedings of the 7th Conference on Health, Inference, and Learning, PMLR 333:721-753, 2026.

Abstract

Despite the burgeoning body of work on distribution shifts, provenance shift—where the relationship between data source and label changes at deployment—remains poorly understood and under-addressed. In this paper, we establish a formal connection between provenance shift, counterfactual invariance, and invariant learning to derive a learning objective for robustness. We then introduce DeconDTN-Toolkit, a specialized evaluation and remediation suite designed to simulate provenance shifts of varying degrees while maintaining the training protocol and the infrastructure of existing benchmarks. We reveal the vulnerability of Empirical Risk Minimization under provenance shift, introduce a robust out-of-distribution performance indicator, and conduct a comprehensive evaluation on existing algorithms. Our work provides both the theoretical grounding and the practical tools necessary to characterize the problem of confounding by provenance, and implementations of methods to mitigate it.

Cite this Paper

BibTeX

@InProceedings{pmlr-v333-tan26a,
  title = 	 {DeconDTN-Toolkit: A Library for Evaluation and Enhancement of Robustness to Provenance Shift},
  author =       {Tan, Yongsen and Sheng, Zhecheng and Ding, Xiruo and Pakhomov, Serguei V S and Cohen, Trevor},
  booktitle = 	 {Proceedings of the 7th Conference on Health, Inference, and Learning},
  pages = 	 {721--753},
  year = 	 {2026},
  editor = 	 {Healey, Elizabeth and Fries, Jason and Pollard, Tom and Tang, Shengpu and Zink, Anna and Hartvigsen, Tom and Agrawal, Monica and Finlayson, Sam and Glicksberg, Benjamin and Beaulieu-Jones, Brett and Wang, Kai and Fontalvo, Daseyra and Sarker, Tasmie and Chen, Irene and Alsentzer, Emily},
  volume = 	 {333},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {29--30 Jun},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v333/main/assets/tan26a/tan26a.pdf},
  url = 	 {https://proceedings.mlr.press/v333/tan26a.html},
  abstract = 	 {Despite the burgeoning body of work on distribution shifts, provenance shift—where the relationship between data source and label changes at deployment—remains poorly understood and under-addressed. In this paper, we establish a formal connection between provenance shift, counterfactual invariance, and invariant learning to derive a learning objective for robustness. We then introduce DeconDTN-Toolkit, a specialized evaluation and remediation suite designed to simulate provenance shifts of varying degrees while maintaining the training protocol and the infrastructure of existing benchmarks. We reveal the vulnerability of Empirical Risk Minimization under provenance shift, introduce a robust out-of-distribution performance indicator, and conduct a comprehensive evaluation on existing algorithms. Our work provides both the theoretical grounding and the practical tools necessary to characterize the problem of confounding by provenance, and implementations of methods to mitigate it.}
}

Endnote

%0 Conference Paper
%T DeconDTN-Toolkit: A Library for Evaluation and Enhancement of Robustness to Provenance Shift
%A Yongsen Tan
%A Zhecheng Sheng
%A Xiruo Ding
%A Serguei V S Pakhomov
%A Trevor Cohen
%B Proceedings of the 7th Conference on Health, Inference, and Learning
%C Proceedings of Machine Learning Research
%D 2026
%E Elizabeth Healey
%E Jason Fries
%E Tom Pollard
%E Shengpu Tang
%E Anna Zink
%E Tom Hartvigsen
%E Monica Agrawal
%E Sam Finlayson
%E Benjamin Glicksberg
%E Brett Beaulieu-Jones
%E Kai Wang
%E Daseyra Fontalvo
%E Tasmie Sarker
%E Irene Chen
%E Emily Alsentzer	
%F pmlr-v333-tan26a
%I PMLR
%P 721--753
%U https://proceedings.mlr.press/v333/tan26a.html
%V 333
%X Despite the burgeoning body of work on distribution shifts, provenance shift—where the relationship between data source and label changes at deployment—remains poorly understood and under-addressed. In this paper, we establish a formal connection between provenance shift, counterfactual invariance, and invariant learning to derive a learning objective for robustness. We then introduce DeconDTN-Toolkit, a specialized evaluation and remediation suite designed to simulate provenance shifts of varying degrees while maintaining the training protocol and the infrastructure of existing benchmarks. We reveal the vulnerability of Empirical Risk Minimization under provenance shift, introduce a robust out-of-distribution performance indicator, and conduct a comprehensive evaluation on existing algorithms. Our work provides both the theoretical grounding and the practical tools necessary to characterize the problem of confounding by provenance, and implementations of methods to mitigate it.

APA

Tan, Y., Sheng, Z., Ding, X., Pakhomov, S.V.S. & Cohen, T.. (2026). DeconDTN-Toolkit: A Library for Evaluation and Enhancement of Robustness to Provenance Shift. Proceedings of the 7th Conference on Health, Inference, and Learning, in Proceedings of Machine Learning Research 333:721-753 Available from https://proceedings.mlr.press/v333/tan26a.html.

Related Material

Download PDF