Mol-AE: Auto-Encoder Based Molecular Representation Learning With 3D Cloze Test Objective

Junwei Yang, Kangjie Zheng, Siyu Long, Zaiqing Nie, Ming Zhang, Xinyu Dai, Wei-Ying Ma, Hao Zhou
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:56793-56811, 2024.

Abstract

3D molecular representation learning has gained tremendous interest and achieved promising performance in various downstream tasks. A series of recent approaches follow a prevalent framework: an encoder-only model coupled with a coordinate denoising objective. However, through a series of analytical experiments, we prove that the encoder-only model with coordinate denoising objective exhibits inconsistency between pre-training and downstream objectives, as well as issues with disrupted atomic identifiers. To address these two issues, we propose Mol-AE for molecular representation learning, an auto-encoder model using positional encoding as atomic identifiers. We also propose a new training objective named 3D Cloze Test to make the model learn better atom spatial relationships from real molecular substructures. Empirical results demonstrate that Mol-AE achieves a large margin performance gain compared to the current state-of-the-art 3D molecular modeling approach.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-yang24al, title = {Mol-{AE}: Auto-Encoder Based Molecular Representation Learning With 3{D} Cloze Test Objective}, author = {Yang, Junwei and Zheng, Kangjie and Long, Siyu and Nie, Zaiqing and Zhang, Ming and Dai, Xinyu and Ma, Wei-Ying and Zhou, Hao}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {56793--56811}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/yang24al/yang24al.pdf}, url = {https://proceedings.mlr.press/v235/yang24al.html}, abstract = {3D molecular representation learning has gained tremendous interest and achieved promising performance in various downstream tasks. A series of recent approaches follow a prevalent framework: an encoder-only model coupled with a coordinate denoising objective. However, through a series of analytical experiments, we prove that the encoder-only model with coordinate denoising objective exhibits inconsistency between pre-training and downstream objectives, as well as issues with disrupted atomic identifiers. To address these two issues, we propose Mol-AE for molecular representation learning, an auto-encoder model using positional encoding as atomic identifiers. We also propose a new training objective named 3D Cloze Test to make the model learn better atom spatial relationships from real molecular substructures. Empirical results demonstrate that Mol-AE achieves a large margin performance gain compared to the current state-of-the-art 3D molecular modeling approach.} }
Endnote
%0 Conference Paper %T Mol-AE: Auto-Encoder Based Molecular Representation Learning With 3D Cloze Test Objective %A Junwei Yang %A Kangjie Zheng %A Siyu Long %A Zaiqing Nie %A Ming Zhang %A Xinyu Dai %A Wei-Ying Ma %A Hao Zhou %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-yang24al %I PMLR %P 56793--56811 %U https://proceedings.mlr.press/v235/yang24al.html %V 235 %X 3D molecular representation learning has gained tremendous interest and achieved promising performance in various downstream tasks. A series of recent approaches follow a prevalent framework: an encoder-only model coupled with a coordinate denoising objective. However, through a series of analytical experiments, we prove that the encoder-only model with coordinate denoising objective exhibits inconsistency between pre-training and downstream objectives, as well as issues with disrupted atomic identifiers. To address these two issues, we propose Mol-AE for molecular representation learning, an auto-encoder model using positional encoding as atomic identifiers. We also propose a new training objective named 3D Cloze Test to make the model learn better atom spatial relationships from real molecular substructures. Empirical results demonstrate that Mol-AE achieves a large margin performance gain compared to the current state-of-the-art 3D molecular modeling approach.
APA
Yang, J., Zheng, K., Long, S., Nie, Z., Zhang, M., Dai, X., Ma, W. & Zhou, H.. (2024). Mol-AE: Auto-Encoder Based Molecular Representation Learning With 3D Cloze Test Objective. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:56793-56811 Available from https://proceedings.mlr.press/v235/yang24al.html.

Related Material