KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors

Benson Chen, Tomasz Danel, Gabriel H. S. Dreiman, Patrick J. Mcenaney, Nikhil Jain, Kirill Novikov, Spurti Umesh Akki, Joshua L. Turnbull, Virja Atul Pandya, Boris P. Belotserkovskii, Jared Bryce Weaver, Ankita Biswas, Dat Nguyen, Kent Gorday, Mohammad Sultan, Nathaniel Stanley, Daniel M Whalen, Divya Kanichar, Christoph Klein, Emily Fox, R. Edward Watts
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:8002-8021, 2025.

Abstract

DNA-Encoded Libraries (DELs) represent a transformative technology in drug discovery, facilitating the high-throughput exploration of vast chemical spaces. Despite their potential, the scarcity of publicly available DEL datasets presents a bottleneck for the advancement of machine learning methodologies in this domain. To address this gap, we introduce KinDEL, one of the largest publicly accessible DEL datasets and the first one that includes binding poses from molecular docking experiments. Focused on two kinases, Mitogen-Activated Protein Kinase 14 (MAPK14) and Discoidin Domain Receptor Tyrosine Kinase 1 (DDR1), KinDEL includes 81 million compounds, offering a rich resource for computational exploration. Additionally, we provide comprehensive biophysical assay validation data, encompassing both on-DNA and off-DNA measurements, which we use to evaluate a suite of machine learning techniques, including novel structure-based probabilistic models. We hope that our benchmark, encompassing both 2D and 3D structures, will help advance the development of machine learning models for data-driven hit identification using DELs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-chen25o, title = {{K}in{DEL}: {DNA}-Encoded Library Dataset for Kinase Inhibitors}, author = {Chen, Benson and Danel, Tomasz and Dreiman, Gabriel H. S. and Mcenaney, Patrick J. and Jain, Nikhil and Novikov, Kirill and Akki, Spurti Umesh and Turnbull, Joshua L. and Pandya, Virja Atul and Belotserkovskii, Boris P. and Weaver, Jared Bryce and Biswas, Ankita and Nguyen, Dat and Gorday, Kent and Sultan, Mohammad and Stanley, Nathaniel and Whalen, Daniel M and Kanichar, Divya and Klein, Christoph and Fox, Emily and Watts, R. Edward}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {8002--8021}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/chen25o/chen25o.pdf}, url = {https://proceedings.mlr.press/v267/chen25o.html}, abstract = {DNA-Encoded Libraries (DELs) represent a transformative technology in drug discovery, facilitating the high-throughput exploration of vast chemical spaces. Despite their potential, the scarcity of publicly available DEL datasets presents a bottleneck for the advancement of machine learning methodologies in this domain. To address this gap, we introduce KinDEL, one of the largest publicly accessible DEL datasets and the first one that includes binding poses from molecular docking experiments. Focused on two kinases, Mitogen-Activated Protein Kinase 14 (MAPK14) and Discoidin Domain Receptor Tyrosine Kinase 1 (DDR1), KinDEL includes 81 million compounds, offering a rich resource for computational exploration. Additionally, we provide comprehensive biophysical assay validation data, encompassing both on-DNA and off-DNA measurements, which we use to evaluate a suite of machine learning techniques, including novel structure-based probabilistic models. We hope that our benchmark, encompassing both 2D and 3D structures, will help advance the development of machine learning models for data-driven hit identification using DELs.} }
Endnote
%0 Conference Paper %T KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors %A Benson Chen %A Tomasz Danel %A Gabriel H. S. Dreiman %A Patrick J. Mcenaney %A Nikhil Jain %A Kirill Novikov %A Spurti Umesh Akki %A Joshua L. Turnbull %A Virja Atul Pandya %A Boris P. Belotserkovskii %A Jared Bryce Weaver %A Ankita Biswas %A Dat Nguyen %A Kent Gorday %A Mohammad Sultan %A Nathaniel Stanley %A Daniel M Whalen %A Divya Kanichar %A Christoph Klein %A Emily Fox %A R. Edward Watts %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-chen25o %I PMLR %P 8002--8021 %U https://proceedings.mlr.press/v267/chen25o.html %V 267 %X DNA-Encoded Libraries (DELs) represent a transformative technology in drug discovery, facilitating the high-throughput exploration of vast chemical spaces. Despite their potential, the scarcity of publicly available DEL datasets presents a bottleneck for the advancement of machine learning methodologies in this domain. To address this gap, we introduce KinDEL, one of the largest publicly accessible DEL datasets and the first one that includes binding poses from molecular docking experiments. Focused on two kinases, Mitogen-Activated Protein Kinase 14 (MAPK14) and Discoidin Domain Receptor Tyrosine Kinase 1 (DDR1), KinDEL includes 81 million compounds, offering a rich resource for computational exploration. Additionally, we provide comprehensive biophysical assay validation data, encompassing both on-DNA and off-DNA measurements, which we use to evaluate a suite of machine learning techniques, including novel structure-based probabilistic models. We hope that our benchmark, encompassing both 2D and 3D structures, will help advance the development of machine learning models for data-driven hit identification using DELs.
APA
Chen, B., Danel, T., Dreiman, G.H.S., Mcenaney, P.J., Jain, N., Novikov, K., Akki, S.U., Turnbull, J.L., Pandya, V.A., Belotserkovskii, B.P., Weaver, J.B., Biswas, A., Nguyen, D., Gorday, K., Sultan, M., Stanley, N., Whalen, D.M., Kanichar, D., Klein, C., Fox, E. & Watts, R.E.. (2025). KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:8002-8021 Available from https://proceedings.mlr.press/v267/chen25o.html.

Related Material