[edit]
Detecting and mitigating issues in image-based COVID-19 diagnosis
Proceedings of the 1st Workshop on Healthcare AI and COVID-19, ICML 2022, PMLR 184:127-135, 2022.
Abstract
As urgency over the coronavirus disease 2019 (COVID-19) increased, many datasets with chest radiography (CXR) and chest computed tomography (CT) images emerged aiming at the detection and prognosis of COVID-19. Over the last two years, thousands of studies have been published, reporting promising results. However, a deeper analysis of the datasets and the methods employed reveals issues that may hamper conclusions and practical applicability. We investigate three major datasets commonly used in these studies, detect problems related to the existence of duplicates, address the specificity of classes within those datasets, and propose a way to perform external validation via cross-dataset evaluation. Our guidelines and findings contribute towards a trust-worthy application of Machine Learning in the context of image-based diagnosis, as well as offer a more accurate assessment of models applied to the prognostication of diseases using image datasets and pave the way towards models that can be relied upon in the real world.