Generative Enzyme Design Guided by Functionally Important Sites and Small-Molecule Substrates

Zhenqiao Song, Yunlong Zhao, Wenxian Shi, Wengong Jin, Yang Yang, Lei Li
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:46259-46279, 2024.

Abstract

Enzymes are genetically encoded biocatalysts capable of accelerating chemical reactions. How can we automatically design functional enzymes? In this paper, we propose EnzyGen, an approach to learn a unified model to design enzymes across all functional families. Our key idea is to generate an enzyme’s amino acid sequence and their three-dimensional (3D) coordinates based on functionally important sites and substrates corresponding to a desired catalytic function. These sites are automatically mined from enzyme databases. EnzyGen consists of a novel interleaving network of attention and neighborhood equivariant layers, which captures both long-range correlation in an entire protein sequence and local influence from nearest amino acids in 3D space. To learn the generative model, we devise a joint training objective, including a sequence generation loss, a position prediction loss and an enzyme-substrate interaction loss. We further construct EnzyBench, a dataset with 3157 enzyme families, covering all available enzymes within the protein data bank (PDB). Experimental results show that our EnzyGen consistently achieves the best performance across all 323 testing families, surpassing the best baseline by 10.79% in terms of substrate binding affinity. These findings demonstrate EnzyGen’s superior capability in designing well-folded and effective enzymes binding to specific substrates with high affinities. Our code, model and dataset are provided at https://github.com/LeiLiLab/EnzyGen.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-song24k, title = {Generative Enzyme Design Guided by Functionally Important Sites and Small-Molecule Substrates}, author = {Song, Zhenqiao and Zhao, Yunlong and Shi, Wenxian and Jin, Wengong and Yang, Yang and Li, Lei}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {46259--46279}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/song24k/song24k.pdf}, url = {https://proceedings.mlr.press/v235/song24k.html}, abstract = {Enzymes are genetically encoded biocatalysts capable of accelerating chemical reactions. How can we automatically design functional enzymes? In this paper, we propose EnzyGen, an approach to learn a unified model to design enzymes across all functional families. Our key idea is to generate an enzyme’s amino acid sequence and their three-dimensional (3D) coordinates based on functionally important sites and substrates corresponding to a desired catalytic function. These sites are automatically mined from enzyme databases. EnzyGen consists of a novel interleaving network of attention and neighborhood equivariant layers, which captures both long-range correlation in an entire protein sequence and local influence from nearest amino acids in 3D space. To learn the generative model, we devise a joint training objective, including a sequence generation loss, a position prediction loss and an enzyme-substrate interaction loss. We further construct EnzyBench, a dataset with 3157 enzyme families, covering all available enzymes within the protein data bank (PDB). Experimental results show that our EnzyGen consistently achieves the best performance across all 323 testing families, surpassing the best baseline by 10.79% in terms of substrate binding affinity. These findings demonstrate EnzyGen’s superior capability in designing well-folded and effective enzymes binding to specific substrates with high affinities. Our code, model and dataset are provided at https://github.com/LeiLiLab/EnzyGen.} }
Endnote
%0 Conference Paper %T Generative Enzyme Design Guided by Functionally Important Sites and Small-Molecule Substrates %A Zhenqiao Song %A Yunlong Zhao %A Wenxian Shi %A Wengong Jin %A Yang Yang %A Lei Li %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-song24k %I PMLR %P 46259--46279 %U https://proceedings.mlr.press/v235/song24k.html %V 235 %X Enzymes are genetically encoded biocatalysts capable of accelerating chemical reactions. How can we automatically design functional enzymes? In this paper, we propose EnzyGen, an approach to learn a unified model to design enzymes across all functional families. Our key idea is to generate an enzyme’s amino acid sequence and their three-dimensional (3D) coordinates based on functionally important sites and substrates corresponding to a desired catalytic function. These sites are automatically mined from enzyme databases. EnzyGen consists of a novel interleaving network of attention and neighborhood equivariant layers, which captures both long-range correlation in an entire protein sequence and local influence from nearest amino acids in 3D space. To learn the generative model, we devise a joint training objective, including a sequence generation loss, a position prediction loss and an enzyme-substrate interaction loss. We further construct EnzyBench, a dataset with 3157 enzyme families, covering all available enzymes within the protein data bank (PDB). Experimental results show that our EnzyGen consistently achieves the best performance across all 323 testing families, surpassing the best baseline by 10.79% in terms of substrate binding affinity. These findings demonstrate EnzyGen’s superior capability in designing well-folded and effective enzymes binding to specific substrates with high affinities. Our code, model and dataset are provided at https://github.com/LeiLiLab/EnzyGen.
APA
Song, Z., Zhao, Y., Shi, W., Jin, W., Yang, Y. & Li, L.. (2024). Generative Enzyme Design Guided by Functionally Important Sites and Small-Molecule Substrates. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:46259-46279 Available from https://proceedings.mlr.press/v235/song24k.html.

Related Material