On the limits of cross-domain generalization in automated X-ray prediction

Joseph Paul Cohen, Mohammad Hashir, Rupert Brooks, Hadrien Bertrand
; Proceedings of the Third Conference on Medical Imaging with Deep Learning, PMLR 121:136-155, 2020.

Abstract

This large scale study focuses on quantifying what X-rays diagnostic prediction tasks generalize well across multiple different datasets. We present evidence that the issue of generalization is not due to a shift in the images but instead a shift in the labels. We study the cross-domain performance, agreement between models, and model representations. We find interesting discrepancies between performance and agreement where models which both achieve good performance disagree in their predictions as well as models which agree yet achieve poor performance. We also test for concept similarity by regularizing a network to group tasks across multiple datasets together and observe variation across the tasks. All code is made available online and data is publicly available: {https://github.com/mlmed/torchxrayvision}.

Cite this Paper


BibTeX
@InProceedings{pmlr-v121-cohen20a, title = {On the limits of cross-domain generalization in automated X-ray prediction}, author = {Cohen, Joseph Paul and Hashir, Mohammad and Brooks, Rupert and Bertrand, Hadrien}, pages = {136--155}, year = {2020}, editor = {Tal Arbel and Ismail Ben Ayed and Marleen de Bruijne and Maxime Descoteaux and Herve Lombaert and Christopher Pal}, volume = {121}, series = {Proceedings of Machine Learning Research}, address = {Montreal, QC, Canada}, month = {06--08 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v121/cohen20a/cohen20a.pdf}, url = {http://proceedings.mlr.press/v121/cohen20a.html}, abstract = {This large scale study focuses on quantifying what X-rays diagnostic prediction tasks generalize well across multiple different datasets. We present evidence that the issue of generalization is not due to a shift in the images but instead a shift in the labels. We study the cross-domain performance, agreement between models, and model representations. We find interesting discrepancies between performance and agreement where models which both achieve good performance disagree in their predictions as well as models which agree yet achieve poor performance. We also test for concept similarity by regularizing a network to group tasks across multiple datasets together and observe variation across the tasks. All code is made available online and data is publicly available: {https://github.com/mlmed/torchxrayvision}.} }
Endnote
%0 Conference Paper %T On the limits of cross-domain generalization in automated X-ray prediction %A Joseph Paul Cohen %A Mohammad Hashir %A Rupert Brooks %A Hadrien Bertrand %B Proceedings of the Third Conference on Medical Imaging with Deep Learning %C Proceedings of Machine Learning Research %D 2020 %E Tal Arbel %E Ismail Ben Ayed %E Marleen de Bruijne %E Maxime Descoteaux %E Herve Lombaert %E Christopher Pal %F pmlr-v121-cohen20a %I PMLR %J Proceedings of Machine Learning Research %P 136--155 %U http://proceedings.mlr.press %V 121 %W PMLR %X This large scale study focuses on quantifying what X-rays diagnostic prediction tasks generalize well across multiple different datasets. We present evidence that the issue of generalization is not due to a shift in the images but instead a shift in the labels. We study the cross-domain performance, agreement between models, and model representations. We find interesting discrepancies between performance and agreement where models which both achieve good performance disagree in their predictions as well as models which agree yet achieve poor performance. We also test for concept similarity by regularizing a network to group tasks across multiple datasets together and observe variation across the tasks. All code is made available online and data is publicly available: {https://github.com/mlmed/torchxrayvision}.
APA
Cohen, J.P., Hashir, M., Brooks, R. & Bertrand, H.. (2020). On the limits of cross-domain generalization in automated X-ray prediction. Proceedings of the Third Conference on Medical Imaging with Deep Learning, in PMLR 121:136-155

Related Material