[edit]
Interaction-based Retrieval-augmented Diffusion Models for Protein-specific 3D Molecule Generation
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:20348-20364, 2024.
Abstract
Generating ligand molecules that bind to specific protein targets via generative models holds substantial promise for advancing structure-based drug design. Existing methods generate molecules from scratch without reference or template ligands, which poses challenges in model optimization and may yield suboptimal outcomes. To address this problem, we propose an innovative interaction-based retrieval-augmented diffusion model named IRDiff to facilitate target-aware molecule generation. IRDiff leverages a curated set of ligand references, i.e., those with desired properties such as high binding affinity, to steer the diffusion model towards synthesizing ligands that satisfy design criteria. Specifically, we utilize a protein-molecule interaction network (PMINet), which is pretrained with binding affinity signals to: (i) retrieve target-aware ligand molecules with high binding affinity to serve as references, and (ii) incorporate essential protein-ligand binding structures for steering molecular diffusion generation with two effective augmentation mechanisms, i.e., retrieval augmentation and self augmentation. Empirical studies on CrossDocked2020 dataset show IRDiff can generate molecules with more realistic 3D structures and achieve state-of-the-art binding affinities towards the protein targets, while maintaining proper molecular properties. The codes and models are available at https://github.com/YangLing0818/IRDiff