Interaction-based Retrieval-augmented Diffusion Models for Protein-specific 3D Molecule Generation

Zhilin Huang, Ling Yang, Xiangxin Zhou, Chujun Qin, Yijie Yu, Xiawu Zheng, Zikun Zhou, Wentao Zhang, Yu Wang, Wenming Yang
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:20348-20364, 2024.

Abstract

Generating ligand molecules that bind to specific protein targets via generative models holds substantial promise for advancing structure-based drug design. Existing methods generate molecules from scratch without reference or template ligands, which poses challenges in model optimization and may yield suboptimal outcomes. To address this problem, we propose an innovative interaction-based retrieval-augmented diffusion model named IRDiff to facilitate target-aware molecule generation. IRDiff leverages a curated set of ligand references, i.e., those with desired properties such as high binding affinity, to steer the diffusion model towards synthesizing ligands that satisfy design criteria. Specifically, we utilize a protein-molecule interaction network (PMINet), which is pretrained with binding affinity signals to: (i) retrieve target-aware ligand molecules with high binding affinity to serve as references, and (ii) incorporate essential protein-ligand binding structures for steering molecular diffusion generation with two effective augmentation mechanisms, i.e., retrieval augmentation and self augmentation. Empirical studies on CrossDocked2020 dataset show IRDiff can generate molecules with more realistic 3D structures and achieve state-of-the-art binding affinities towards the protein targets, while maintaining proper molecular properties. The codes and models are available at https://github.com/YangLing0818/IRDiff

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-huang24ab, title = {Interaction-based Retrieval-augmented Diffusion Models for Protein-specific 3{D} Molecule Generation}, author = {Huang, Zhilin and Yang, Ling and Zhou, Xiangxin and Qin, Chujun and Yu, Yijie and Zheng, Xiawu and Zhou, Zikun and Zhang, Wentao and Wang, Yu and Yang, Wenming}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {20348--20364}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/huang24ab/huang24ab.pdf}, url = {https://proceedings.mlr.press/v235/huang24ab.html}, abstract = {Generating ligand molecules that bind to specific protein targets via generative models holds substantial promise for advancing structure-based drug design. Existing methods generate molecules from scratch without reference or template ligands, which poses challenges in model optimization and may yield suboptimal outcomes. To address this problem, we propose an innovative interaction-based retrieval-augmented diffusion model named IRDiff to facilitate target-aware molecule generation. IRDiff leverages a curated set of ligand references, i.e., those with desired properties such as high binding affinity, to steer the diffusion model towards synthesizing ligands that satisfy design criteria. Specifically, we utilize a protein-molecule interaction network (PMINet), which is pretrained with binding affinity signals to: (i) retrieve target-aware ligand molecules with high binding affinity to serve as references, and (ii) incorporate essential protein-ligand binding structures for steering molecular diffusion generation with two effective augmentation mechanisms, i.e., retrieval augmentation and self augmentation. Empirical studies on CrossDocked2020 dataset show IRDiff can generate molecules with more realistic 3D structures and achieve state-of-the-art binding affinities towards the protein targets, while maintaining proper molecular properties. The codes and models are available at https://github.com/YangLing0818/IRDiff} }
Endnote
%0 Conference Paper %T Interaction-based Retrieval-augmented Diffusion Models for Protein-specific 3D Molecule Generation %A Zhilin Huang %A Ling Yang %A Xiangxin Zhou %A Chujun Qin %A Yijie Yu %A Xiawu Zheng %A Zikun Zhou %A Wentao Zhang %A Yu Wang %A Wenming Yang %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-huang24ab %I PMLR %P 20348--20364 %U https://proceedings.mlr.press/v235/huang24ab.html %V 235 %X Generating ligand molecules that bind to specific protein targets via generative models holds substantial promise for advancing structure-based drug design. Existing methods generate molecules from scratch without reference or template ligands, which poses challenges in model optimization and may yield suboptimal outcomes. To address this problem, we propose an innovative interaction-based retrieval-augmented diffusion model named IRDiff to facilitate target-aware molecule generation. IRDiff leverages a curated set of ligand references, i.e., those with desired properties such as high binding affinity, to steer the diffusion model towards synthesizing ligands that satisfy design criteria. Specifically, we utilize a protein-molecule interaction network (PMINet), which is pretrained with binding affinity signals to: (i) retrieve target-aware ligand molecules with high binding affinity to serve as references, and (ii) incorporate essential protein-ligand binding structures for steering molecular diffusion generation with two effective augmentation mechanisms, i.e., retrieval augmentation and self augmentation. Empirical studies on CrossDocked2020 dataset show IRDiff can generate molecules with more realistic 3D structures and achieve state-of-the-art binding affinities towards the protein targets, while maintaining proper molecular properties. The codes and models are available at https://github.com/YangLing0818/IRDiff
APA
Huang, Z., Yang, L., Zhou, X., Qin, C., Yu, Y., Zheng, X., Zhou, Z., Zhang, W., Wang, Y. & Yang, W.. (2024). Interaction-based Retrieval-augmented Diffusion Models for Protein-specific 3D Molecule Generation. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:20348-20364 Available from https://proceedings.mlr.press/v235/huang24ab.html.

Related Material