Interpreting Dataset Shift in Clinical Notes

Shariar Vaez-Ghaemi, Furong Jia, Monica Agrawal
Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:243-262, 2026.

Abstract

Distribution shift can lead to degradation in the performance of machine learning models. This concern is particularly salient in medicine, in which several forces can lead to shifts in Electronic Health Record ({EHR}) data. Distribution shift in the text domain is vastly understudied, but increasingly important, given the widespread integration of large language models into clinical workflows. Identifying the existence of a shift is necessary but insufficient; actionability often requires understanding the nature of the shift. To address this challenge, we establish an extensible benchmark suite that induces synthetic distribution shifts using real clinical notes and develop two methods to assess generated shift explanations. We further introduce {SIReNs}, a general-domain end-to-end approach that explains distributional differences between two datasets by selecting representative notes from each. The {SIReNs} method was evaluated on both binary and continuous feature shifts, and the results show that it recovers salient binary shifts well, but struggles with more subtle shifts. A substantial gap remains to a ground-truth oracle for continuous shifts, suggesting room for improvement in future methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v297-vaez-ghaemi26a, title = {Interpreting Dataset Shift in Clinical Notes}, author = {Vaez-Ghaemi, Shariar and Jia, Furong and Agrawal, Monica}, booktitle = {Proceedings of the Fifth Machine Learning for Health Symposium}, pages = {243--262}, year = {2026}, editor = {Argaw, Peniel and Zhang, Haoran and Jabbour, Sarah and Chandak, Payal and Ji, Jerry and Mukherjee, Sumit and Salaudeen, Olawale and Chang, Trenton and Healey, Elizabeth and Gröger, Fabian and Adibi, Amin and Hegselmann, Stefan and Wild, Benjamin and Noori, Ayush}, volume = {297}, series = {Proceedings of Machine Learning Research}, month = {13--14 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v297/main/assets/vaez-ghaemi26a/vaez-ghaemi26a.pdf}, url = {https://proceedings.mlr.press/v297/vaez-ghaemi26a.html}, abstract = {Distribution shift can lead to degradation in the performance of machine learning models. This concern is particularly salient in medicine, in which several forces can lead to shifts in Electronic Health Record ({EHR}) data. Distribution shift in the text domain is vastly understudied, but increasingly important, given the widespread integration of large language models into clinical workflows. Identifying the existence of a shift is necessary but insufficient; actionability often requires understanding the nature of the shift. To address this challenge, we establish an extensible benchmark suite that induces synthetic distribution shifts using real clinical notes and develop two methods to assess generated shift explanations. We further introduce {SIReNs}, a general-domain end-to-end approach that explains distributional differences between two datasets by selecting representative notes from each. The {SIReNs} method was evaluated on both binary and continuous feature shifts, and the results show that it recovers salient binary shifts well, but struggles with more subtle shifts. A substantial gap remains to a ground-truth oracle for continuous shifts, suggesting room for improvement in future methods.} }
Endnote
%0 Conference Paper %T Interpreting Dataset Shift in Clinical Notes %A Shariar Vaez-Ghaemi %A Furong Jia %A Monica Agrawal %B Proceedings of the Fifth Machine Learning for Health Symposium %C Proceedings of Machine Learning Research %D 2026 %E Peniel Argaw %E Haoran Zhang %E Sarah Jabbour %E Payal Chandak %E Jerry Ji %E Sumit Mukherjee %E Olawale Salaudeen %E Trenton Chang %E Elizabeth Healey %E Fabian Gröger %E Amin Adibi %E Stefan Hegselmann %E Benjamin Wild %E Ayush Noori %F pmlr-v297-vaez-ghaemi26a %I PMLR %P 243--262 %U https://proceedings.mlr.press/v297/vaez-ghaemi26a.html %V 297 %X Distribution shift can lead to degradation in the performance of machine learning models. This concern is particularly salient in medicine, in which several forces can lead to shifts in Electronic Health Record ({EHR}) data. Distribution shift in the text domain is vastly understudied, but increasingly important, given the widespread integration of large language models into clinical workflows. Identifying the existence of a shift is necessary but insufficient; actionability often requires understanding the nature of the shift. To address this challenge, we establish an extensible benchmark suite that induces synthetic distribution shifts using real clinical notes and develop two methods to assess generated shift explanations. We further introduce {SIReNs}, a general-domain end-to-end approach that explains distributional differences between two datasets by selecting representative notes from each. The {SIReNs} method was evaluated on both binary and continuous feature shifts, and the results show that it recovers salient binary shifts well, but struggles with more subtle shifts. A substantial gap remains to a ground-truth oracle for continuous shifts, suggesting room for improvement in future methods.
APA
Vaez-Ghaemi, S., Jia, F. & Agrawal, M.. (2026). Interpreting Dataset Shift in Clinical Notes. Proceedings of the Fifth Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 297:243-262 Available from https://proceedings.mlr.press/v297/vaez-ghaemi26a.html.

Related Material