ESM All-Atom: Multi-Scale Protein Language Model for Unified Molecular Modeling

Kangjie Zheng, Siyu Long, Tianyu Lu, Junwei Yang, Xinyu Dai, Ming Zhang, Zaiqing Nie, Wei-Ying Ma, Hao Zhou
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:61432-61453, 2024.

Abstract

Protein language models have demonstrated significant potential in the field of protein engineering. However, current protein language models primarily operate at the residue scale, which limits their ability to provide information at the atom level. This limitation prevents us from fully exploiting the capabilities of protein language models for applications involving both proteins and small molecules. In this paper, we propose ESM-AA (ESM All-Atom), a novel approach that enables atom-scale and residue-scale unified molecular modeling. ESM-AA achieves this by pre-training on multi-scale code-switch protein sequences and utilizing a multi-scale position encoding to capture relationships among residues and atoms. Experimental results indicate that ESM-AA surpasses previous methods in protein-molecule tasks, demonstrating the full utilization of protein language models. Further investigations reveal that through unified molecular modeling, ESM-AA not only gains molecular knowledge but also retains its understanding of proteins.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-zheng24h, title = {{ESM} All-Atom: Multi-Scale Protein Language Model for Unified Molecular Modeling}, author = {Zheng, Kangjie and Long, Siyu and Lu, Tianyu and Yang, Junwei and Dai, Xinyu and Zhang, Ming and Nie, Zaiqing and Ma, Wei-Ying and Zhou, Hao}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {61432--61453}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/zheng24h/zheng24h.pdf}, url = {https://proceedings.mlr.press/v235/zheng24h.html}, abstract = {Protein language models have demonstrated significant potential in the field of protein engineering. However, current protein language models primarily operate at the residue scale, which limits their ability to provide information at the atom level. This limitation prevents us from fully exploiting the capabilities of protein language models for applications involving both proteins and small molecules. In this paper, we propose ESM-AA (ESM All-Atom), a novel approach that enables atom-scale and residue-scale unified molecular modeling. ESM-AA achieves this by pre-training on multi-scale code-switch protein sequences and utilizing a multi-scale position encoding to capture relationships among residues and atoms. Experimental results indicate that ESM-AA surpasses previous methods in protein-molecule tasks, demonstrating the full utilization of protein language models. Further investigations reveal that through unified molecular modeling, ESM-AA not only gains molecular knowledge but also retains its understanding of proteins.} }
Endnote
%0 Conference Paper %T ESM All-Atom: Multi-Scale Protein Language Model for Unified Molecular Modeling %A Kangjie Zheng %A Siyu Long %A Tianyu Lu %A Junwei Yang %A Xinyu Dai %A Ming Zhang %A Zaiqing Nie %A Wei-Ying Ma %A Hao Zhou %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-zheng24h %I PMLR %P 61432--61453 %U https://proceedings.mlr.press/v235/zheng24h.html %V 235 %X Protein language models have demonstrated significant potential in the field of protein engineering. However, current protein language models primarily operate at the residue scale, which limits their ability to provide information at the atom level. This limitation prevents us from fully exploiting the capabilities of protein language models for applications involving both proteins and small molecules. In this paper, we propose ESM-AA (ESM All-Atom), a novel approach that enables atom-scale and residue-scale unified molecular modeling. ESM-AA achieves this by pre-training on multi-scale code-switch protein sequences and utilizing a multi-scale position encoding to capture relationships among residues and atoms. Experimental results indicate that ESM-AA surpasses previous methods in protein-molecule tasks, demonstrating the full utilization of protein language models. Further investigations reveal that through unified molecular modeling, ESM-AA not only gains molecular knowledge but also retains its understanding of proteins.
APA
Zheng, K., Long, S., Lu, T., Yang, J., Dai, X., Zhang, M., Nie, Z., Ma, W. & Zhou, H.. (2024). ESM All-Atom: Multi-Scale Protein Language Model for Unified Molecular Modeling. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:61432-61453 Available from https://proceedings.mlr.press/v235/zheng24h.html.

Related Material