Trust Issues: Uncertainty Estimation Does Not Enable Reliable OOD Detection On Medical Tabular Data

Dennis Ulmer, Lotta Meijerink, Giovanni Cinà
Proceedings of the Machine Learning for Health NeurIPS Workshop, PMLR 136:341-354, 2020.

Abstract

When deploying machine learning models in high-stakes real-world environments such as health care, it is crucial to accurately assess the uncertainty concerning a model’s prediction on abnormal inputs. However, there is a scarcity of literature analyzing this problem on medical data, especially on mixed-type tabular data such as Electronic Health Records. We close this gap by presenting a series of tests including a large variety of contemporary uncertainty estimation techniques, in order to determine whether they are able to identify out-ofdistribution (OOD) patients. In contrast to previous work, we design tests on realistic and clinically relevant OOD groups, and run experiments on real-world medical data. We find that almost all techniques fail to achieve convincing results, partly disagreeing with earlier findings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v136-ulmer20a, title = {Trust Issues: Uncertainty Estimation Does Not Enable Reliable OOD Detection On Medical Tabular Data}, author = {Ulmer, Dennis and Meijerink, Lotta and Cin\`a, Giovanni}, booktitle = {Proceedings of the Machine Learning for Health NeurIPS Workshop}, pages = {341--354}, year = {2020}, editor = {Alsentzer, Emily and McDermott, Matthew B. A. and Falck, Fabian and Sarkar, Suproteem K. and Roy, Subhrajit and Hyland, Stephanie L.}, volume = {136}, series = {Proceedings of Machine Learning Research}, month = {11 Dec}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v136/ulmer20a/ulmer20a.pdf}, url = {https://proceedings.mlr.press/v136/ulmer20a.html}, abstract = {When deploying machine learning models in high-stakes real-world environments such as health care, it is crucial to accurately assess the uncertainty concerning a model’s prediction on abnormal inputs. However, there is a scarcity of literature analyzing this problem on medical data, especially on mixed-type tabular data such as Electronic Health Records. We close this gap by presenting a series of tests including a large variety of contemporary uncertainty estimation techniques, in order to determine whether they are able to identify out-ofdistribution (OOD) patients. In contrast to previous work, we design tests on realistic and clinically relevant OOD groups, and run experiments on real-world medical data. We find that almost all techniques fail to achieve convincing results, partly disagreeing with earlier findings.} }
Endnote
%0 Conference Paper %T Trust Issues: Uncertainty Estimation Does Not Enable Reliable OOD Detection On Medical Tabular Data %A Dennis Ulmer %A Lotta Meijerink %A Giovanni Cinà %B Proceedings of the Machine Learning for Health NeurIPS Workshop %C Proceedings of Machine Learning Research %D 2020 %E Emily Alsentzer %E Matthew B. A. McDermott %E Fabian Falck %E Suproteem K. Sarkar %E Subhrajit Roy %E Stephanie L. Hyland %F pmlr-v136-ulmer20a %I PMLR %P 341--354 %U https://proceedings.mlr.press/v136/ulmer20a.html %V 136 %X When deploying machine learning models in high-stakes real-world environments such as health care, it is crucial to accurately assess the uncertainty concerning a model’s prediction on abnormal inputs. However, there is a scarcity of literature analyzing this problem on medical data, especially on mixed-type tabular data such as Electronic Health Records. We close this gap by presenting a series of tests including a large variety of contemporary uncertainty estimation techniques, in order to determine whether they are able to identify out-ofdistribution (OOD) patients. In contrast to previous work, we design tests on realistic and clinically relevant OOD groups, and run experiments on real-world medical data. We find that almost all techniques fail to achieve convincing results, partly disagreeing with earlier findings.
APA
Ulmer, D., Meijerink, L. & Cinà, G.. (2020). Trust Issues: Uncertainty Estimation Does Not Enable Reliable OOD Detection On Medical Tabular Data. Proceedings of the Machine Learning for Health NeurIPS Workshop, in Proceedings of Machine Learning Research 136:341-354 Available from https://proceedings.mlr.press/v136/ulmer20a.html.

Related Material