Surface-VQMAE: Vector-quantized Masked Auto-encoders on Molecular Surfaces

Fang Wu, Stan Z. Li
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:53619-53634, 2024.

Abstract

Molecular surfaces imply fingerprints of interaction patterns between proteins. However, non-equivalent efforts have been paid to incorporating the abundant protein surface information for analyzing proteins’ biological functions in juxtaposition to amino acid sequences and 3D structures. We propose a novel surface-based unsupervised learning algorithm termed Surface-VQMAE to overcome this obstacle. In light of surface point clouds’ sparsity and disorder properties, we first partition them into patches and obtain the sequential arrangement via the Morton curve. Successively, a Transformer-based architecture named SurfFormer was introduced to integrate the surface geometry and capture patch-level relations. At last, we enhance the prevalent masked auto-encoder (MAE) with the vector quantization (VQ) technique, which establishes a surface pattern codebook to enforce a discrete posterior distribution of latent variables and achieve more condensed semantics. Our work is the foremost to implement pretraining purely on molecular surfaces and extensive experiments on diverse real-life scenarios including binding site scoring, binding affinity prediction, and mutant effect estimation demonstrate its effectiveness. The code is available at https://github.com/smiles724/VQMAE.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-wu24o, title = {Surface-{VQMAE}: Vector-quantized Masked Auto-encoders on Molecular Surfaces}, author = {Wu, Fang and Li, Stan Z.}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {53619--53634}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/wu24o/wu24o.pdf}, url = {https://proceedings.mlr.press/v235/wu24o.html}, abstract = {Molecular surfaces imply fingerprints of interaction patterns between proteins. However, non-equivalent efforts have been paid to incorporating the abundant protein surface information for analyzing proteins’ biological functions in juxtaposition to amino acid sequences and 3D structures. We propose a novel surface-based unsupervised learning algorithm termed Surface-VQMAE to overcome this obstacle. In light of surface point clouds’ sparsity and disorder properties, we first partition them into patches and obtain the sequential arrangement via the Morton curve. Successively, a Transformer-based architecture named SurfFormer was introduced to integrate the surface geometry and capture patch-level relations. At last, we enhance the prevalent masked auto-encoder (MAE) with the vector quantization (VQ) technique, which establishes a surface pattern codebook to enforce a discrete posterior distribution of latent variables and achieve more condensed semantics. Our work is the foremost to implement pretraining purely on molecular surfaces and extensive experiments on diverse real-life scenarios including binding site scoring, binding affinity prediction, and mutant effect estimation demonstrate its effectiveness. The code is available at https://github.com/smiles724/VQMAE.} }
Endnote
%0 Conference Paper %T Surface-VQMAE: Vector-quantized Masked Auto-encoders on Molecular Surfaces %A Fang Wu %A Stan Z. Li %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-wu24o %I PMLR %P 53619--53634 %U https://proceedings.mlr.press/v235/wu24o.html %V 235 %X Molecular surfaces imply fingerprints of interaction patterns between proteins. However, non-equivalent efforts have been paid to incorporating the abundant protein surface information for analyzing proteins’ biological functions in juxtaposition to amino acid sequences and 3D structures. We propose a novel surface-based unsupervised learning algorithm termed Surface-VQMAE to overcome this obstacle. In light of surface point clouds’ sparsity and disorder properties, we first partition them into patches and obtain the sequential arrangement via the Morton curve. Successively, a Transformer-based architecture named SurfFormer was introduced to integrate the surface geometry and capture patch-level relations. At last, we enhance the prevalent masked auto-encoder (MAE) with the vector quantization (VQ) technique, which establishes a surface pattern codebook to enforce a discrete posterior distribution of latent variables and achieve more condensed semantics. Our work is the foremost to implement pretraining purely on molecular surfaces and extensive experiments on diverse real-life scenarios including binding site scoring, binding affinity prediction, and mutant effect estimation demonstrate its effectiveness. The code is available at https://github.com/smiles724/VQMAE.
APA
Wu, F. & Li, S.Z.. (2024). Surface-VQMAE: Vector-quantized Masked Auto-encoders on Molecular Surfaces. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:53619-53634 Available from https://proceedings.mlr.press/v235/wu24o.html.

Related Material