CarbonNovo: Joint Design of Protein Structure and Sequence Using a Unified Energy-based Model

Milong Ren; Tian Zhu; Haicang Zhang

CarbonNovo: Joint Design of Protein Structure and Sequence Using a Unified Energy-based Model

Milong Ren, Tian Zhu, Haicang Zhang

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:42462-42483, 2024.

Abstract

De novo protein design aims to create novel protein structures and sequences unseen in nature. Recent structure-oriented design methods typically employ a two-stage strategy, where structure design and sequence design modules are trained separately, and the backbone structures and sequences are generated sequentially in inference. While diffusion-based generative models like RFdiffusion show great promise in structure design, they face inherent limitations within the two-stage framework. First, the sequence design module risks overfitting, as the accuracy of the generated structures may not align with that of the crystal structures used for training. Second, the sequence design module lacks interaction with the structure design module to further optimize the generated structures. To address these challenges, we propose CarbonNovo, a unified energy-based model for jointly generating protein structure and sequence. Specifically, we leverage a score-based generative model and Markov Random Fields for describing the energy landscape of protein structure and sequence. In CarbonNovo, the structure and sequence design module communicates at each diffusion step, encouraging the generation of more coherent structure-sequence pairs. Moreover, the unified framework allows for incorporating the protein language models as evolutionary constraints for generated proteins. The rigorous evaluation demonstrates that CarbonNovo outperforms two-stage methods across various metrics, including designability, novelty, sequence plausibility, and Rosetta Energy.

Cite this Paper

BibTeX

@InProceedings{pmlr-v235-ren24e,
  title = 	 {{C}arbon{N}ovo: Joint Design of Protein Structure and Sequence Using a Unified Energy-based Model},
  author =       {Ren, Milong and Zhu, Tian and Zhang, Haicang},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {42462--42483},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/ren24e/ren24e.pdf},
  url = 	 {https://proceedings.mlr.press/v235/ren24e.html},
  abstract = 	 {De novo protein design aims to create novel protein structures and sequences unseen in nature. Recent structure-oriented design methods typically employ a two-stage strategy, where structure design and sequence design modules are trained separately, and the backbone structures and sequences are generated sequentially in inference. While diffusion-based generative models like RFdiffusion show great promise in structure design, they face inherent limitations within the two-stage framework. First, the sequence design module risks overfitting, as the accuracy of the generated structures may not align with that of the crystal structures used for training. Second, the sequence design module lacks interaction with the structure design module to further optimize the generated structures. To address these challenges, we propose CarbonNovo, a unified energy-based model for jointly generating protein structure and sequence. Specifically, we leverage a score-based generative model and Markov Random Fields for describing the energy landscape of protein structure and sequence. In CarbonNovo, the structure and sequence design module communicates at each diffusion step, encouraging the generation of more coherent structure-sequence pairs. Moreover, the unified framework allows for incorporating the protein language models as evolutionary constraints for generated proteins. The rigorous evaluation demonstrates that CarbonNovo outperforms two-stage methods across various metrics, including designability, novelty, sequence plausibility, and Rosetta Energy.}
}

Endnote

%0 Conference Paper
%T CarbonNovo: Joint Design of Protein Structure and Sequence Using a Unified Energy-based Model
%A Milong Ren
%A Tian Zhu
%A Haicang Zhang
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-ren24e
%I PMLR
%P 42462--42483
%U https://proceedings.mlr.press/v235/ren24e.html
%V 235
%X De novo protein design aims to create novel protein structures and sequences unseen in nature. Recent structure-oriented design methods typically employ a two-stage strategy, where structure design and sequence design modules are trained separately, and the backbone structures and sequences are generated sequentially in inference. While diffusion-based generative models like RFdiffusion show great promise in structure design, they face inherent limitations within the two-stage framework. First, the sequence design module risks overfitting, as the accuracy of the generated structures may not align with that of the crystal structures used for training. Second, the sequence design module lacks interaction with the structure design module to further optimize the generated structures. To address these challenges, we propose CarbonNovo, a unified energy-based model for jointly generating protein structure and sequence. Specifically, we leverage a score-based generative model and Markov Random Fields for describing the energy landscape of protein structure and sequence. In CarbonNovo, the structure and sequence design module communicates at each diffusion step, encouraging the generation of more coherent structure-sequence pairs. Moreover, the unified framework allows for incorporating the protein language models as evolutionary constraints for generated proteins. The rigorous evaluation demonstrates that CarbonNovo outperforms two-stage methods across various metrics, including designability, novelty, sequence plausibility, and Rosetta Energy.

APA

Ren, M., Zhu, T. & Zhang, H.. (2024). CarbonNovo: Joint Design of Protein Structure and Sequence Using a Unified Energy-based Model. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:42462-42483 Available from https://proceedings.mlr.press/v235/ren24e.html.

CarbonNovo: Joint Design of Protein Structure and Sequence Using a Unified Energy-based Model

Abstract

Cite this Paper

Related Material