Revisiting Machine-Learning based Drug Repurposing: Drug Indications Are Not a Right Prediction Target
Proceedings of the Conference on Health, Inference, and Learning, PMLR 209:100-116, 2023.
In this paper, we challenge the utility of approved drug indications as a prediction target for machine learning in drug repurposing (DR) studies. Our research highlights two major limitations of this approach: 1) the presence of strong confounding between drug indications and drug characteristics data, which results in shortcut learning, and 2) inappropriate normalization of indications in existing drug-disease association (DDA) datasets, which leads to an overestimation of model performance. We show that the collection patterns of drug characteristics data were similar within drugs of the same category and the Anatomical Therapeutic Chemical (ATC) classification of drugs could be predicted by using the data collection patterns. Furthermore, we confirm that the performance of existing DR models is significantly degraded in the realistic evaluation setting we proposed in this study. We provide realistic data split information for two benchmark datasets, Fdataset and deepDR dataset.