A Latent Diffusion Model for Protein Structure Generation

Cong Fu, Keqiang Yan, Limei Wang, Wing Yee Au, Michael Curtis McThrow, Tao Komikado, Koji Maruhashi, Kanji Uchino, Xiaoning Qian, Shuiwang Ji
Proceedings of the Second Learning on Graphs Conference, PMLR 231:29:1-29:17, 2024.

Abstract

Proteins are complex biomolecules that perform a variety of crucial functions within living organisms. Designing and generating novel proteins can pave the way for many future synthetic biology applications, including drug discovery. However, it remains a challenging computational task due to the large modeling space of protein structures. In this study, we propose a latent diffusion model that can reduce the complexity of protein modeling while flexibly capturing the distribution of natural protein structures in a condensed latent space. Specifically, we propose an equivariant protein autoencoder that embeds proteins into a latent space and then uses an equivariant diffusion model to learn the distribution of the latent protein representations. Experimental results demonstrate that our method can effectively generate novel protein backbone structures with high designability and efficiency. The code will be made publicly available at https://github.com/divelab/AIRS/tree/main/OpenProt/LatentDiff

Cite this Paper


BibTeX
@InProceedings{pmlr-v231-fu24a, title = {A Latent Diffusion Model for Protein Structure Generation}, author = {Fu, Cong and Yan, Keqiang and Wang, Limei and Au, Wing Yee and McThrow, Michael Curtis and Komikado, Tao and Maruhashi, Koji and Uchino, Kanji and Qian, Xiaoning and Ji, Shuiwang}, booktitle = {Proceedings of the Second Learning on Graphs Conference}, pages = {29:1--29:17}, year = {2024}, editor = {Villar, Soledad and Chamberlain, Benjamin}, volume = {231}, series = {Proceedings of Machine Learning Research}, month = {27--30 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v231/fu24a/fu24a.pdf}, url = {https://proceedings.mlr.press/v231/fu24a.html}, abstract = {Proteins are complex biomolecules that perform a variety of crucial functions within living organisms. Designing and generating novel proteins can pave the way for many future synthetic biology applications, including drug discovery. However, it remains a challenging computational task due to the large modeling space of protein structures. In this study, we propose a latent diffusion model that can reduce the complexity of protein modeling while flexibly capturing the distribution of natural protein structures in a condensed latent space. Specifically, we propose an equivariant protein autoencoder that embeds proteins into a latent space and then uses an equivariant diffusion model to learn the distribution of the latent protein representations. Experimental results demonstrate that our method can effectively generate novel protein backbone structures with high designability and efficiency. The code will be made publicly available at https://github.com/divelab/AIRS/tree/main/OpenProt/LatentDiff} }
Endnote
%0 Conference Paper %T A Latent Diffusion Model for Protein Structure Generation %A Cong Fu %A Keqiang Yan %A Limei Wang %A Wing Yee Au %A Michael Curtis McThrow %A Tao Komikado %A Koji Maruhashi %A Kanji Uchino %A Xiaoning Qian %A Shuiwang Ji %B Proceedings of the Second Learning on Graphs Conference %C Proceedings of Machine Learning Research %D 2024 %E Soledad Villar %E Benjamin Chamberlain %F pmlr-v231-fu24a %I PMLR %P 29:1--29:17 %U https://proceedings.mlr.press/v231/fu24a.html %V 231 %X Proteins are complex biomolecules that perform a variety of crucial functions within living organisms. Designing and generating novel proteins can pave the way for many future synthetic biology applications, including drug discovery. However, it remains a challenging computational task due to the large modeling space of protein structures. In this study, we propose a latent diffusion model that can reduce the complexity of protein modeling while flexibly capturing the distribution of natural protein structures in a condensed latent space. Specifically, we propose an equivariant protein autoencoder that embeds proteins into a latent space and then uses an equivariant diffusion model to learn the distribution of the latent protein representations. Experimental results demonstrate that our method can effectively generate novel protein backbone structures with high designability and efficiency. The code will be made publicly available at https://github.com/divelab/AIRS/tree/main/OpenProt/LatentDiff
APA
Fu, C., Yan, K., Wang, L., Au, W.Y., McThrow, M.C., Komikado, T., Maruhashi, K., Uchino, K., Qian, X. & Ji, S.. (2024). A Latent Diffusion Model for Protein Structure Generation. Proceedings of the Second Learning on Graphs Conference, in Proceedings of Machine Learning Research 231:29:1-29:17 Available from https://proceedings.mlr.press/v231/fu24a.html.

Related Material