Revisiting Machine-Learning based Drug Repurposing: Drug Indications Are Not a Right Prediction Target

Siun Kim, Jung-Hyun Won, David Seung U Lee, Renqian Luo, Lijun Wu, Yingce Xia, Tao Qin, Howard Lee
Proceedings of the Conference on Health, Inference, and Learning, PMLR 209:100-116, 2023.

Abstract

In this paper, we challenge the utility of approved drug indications as a prediction target for machine learning in drug repurposing (DR) studies. Our research highlights two major limitations of this approach: 1) the presence of strong confounding between drug indications and drug characteristics data, which results in shortcut learning, and 2) inappropriate normalization of indications in existing drug-disease association (DDA) datasets, which leads to an overestimation of model performance. We show that the collection patterns of drug characteristics data were similar within drugs of the same category and the Anatomical Therapeutic Chemical (ATC) classification of drugs could be predicted by using the data collection patterns. Furthermore, we confirm that the performance of existing DR models is significantly degraded in the realistic evaluation setting we proposed in this study. We provide realistic data split information for two benchmark datasets, Fdataset and deepDR dataset.

Cite this Paper


BibTeX
@InProceedings{pmlr-v209-kim23a, title = {Revisiting Machine-Learning based Drug Repurposing: Drug Indications Are Not a Right Prediction Target}, author = {Kim, Siun and Won, Jung-Hyun and Lee, David Seung U and Luo, Renqian and Wu, Lijun and Xia, Yingce and Qin, Tao and Lee, Howard}, booktitle = {Proceedings of the Conference on Health, Inference, and Learning}, pages = {100--116}, year = {2023}, editor = {Mortazavi, Bobak J. and Sarker, Tasmie and Beam, Andrew and Ho, Joyce C.}, volume = {209}, series = {Proceedings of Machine Learning Research}, month = {22 Jun--24 Jun}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v209/kim23a/kim23a.pdf}, url = {https://proceedings.mlr.press/v209/kim23a.html}, abstract = {In this paper, we challenge the utility of approved drug indications as a prediction target for machine learning in drug repurposing (DR) studies. Our research highlights two major limitations of this approach: 1) the presence of strong confounding between drug indications and drug characteristics data, which results in shortcut learning, and 2) inappropriate normalization of indications in existing drug-disease association (DDA) datasets, which leads to an overestimation of model performance. We show that the collection patterns of drug characteristics data were similar within drugs of the same category and the Anatomical Therapeutic Chemical (ATC) classification of drugs could be predicted by using the data collection patterns. Furthermore, we confirm that the performance of existing DR models is significantly degraded in the realistic evaluation setting we proposed in this study. We provide realistic data split information for two benchmark datasets, Fdataset and deepDR dataset.} }
Endnote
%0 Conference Paper %T Revisiting Machine-Learning based Drug Repurposing: Drug Indications Are Not a Right Prediction Target %A Siun Kim %A Jung-Hyun Won %A David Seung U Lee %A Renqian Luo %A Lijun Wu %A Yingce Xia %A Tao Qin %A Howard Lee %B Proceedings of the Conference on Health, Inference, and Learning %C Proceedings of Machine Learning Research %D 2023 %E Bobak J. Mortazavi %E Tasmie Sarker %E Andrew Beam %E Joyce C. Ho %F pmlr-v209-kim23a %I PMLR %P 100--116 %U https://proceedings.mlr.press/v209/kim23a.html %V 209 %X In this paper, we challenge the utility of approved drug indications as a prediction target for machine learning in drug repurposing (DR) studies. Our research highlights two major limitations of this approach: 1) the presence of strong confounding between drug indications and drug characteristics data, which results in shortcut learning, and 2) inappropriate normalization of indications in existing drug-disease association (DDA) datasets, which leads to an overestimation of model performance. We show that the collection patterns of drug characteristics data were similar within drugs of the same category and the Anatomical Therapeutic Chemical (ATC) classification of drugs could be predicted by using the data collection patterns. Furthermore, we confirm that the performance of existing DR models is significantly degraded in the realistic evaluation setting we proposed in this study. We provide realistic data split information for two benchmark datasets, Fdataset and deepDR dataset.
APA
Kim, S., Won, J., Lee, D.S.U., Luo, R., Wu, L., Xia, Y., Qin, T. & Lee, H.. (2023). Revisiting Machine-Learning based Drug Repurposing: Drug Indications Are Not a Right Prediction Target. Proceedings of the Conference on Health, Inference, and Learning, in Proceedings of Machine Learning Research 209:100-116 Available from https://proceedings.mlr.press/v209/kim23a.html.

Related Material