3D Infomax improves GNNs for Molecular Property Prediction

Hannes Stärk, Dominique Beaini, Gabriele Corso, Prudencio Tossou, Christian Dallago, Stephan Günnemann, Pietro Lió
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:20479-20502, 2022.

Abstract

Molecular property prediction is one of the fastest-growing applications of deep learning with critical real-world impacts. Although the 3D molecular graph structure is necessary for models to achieve strong performance on many tasks, it is infeasible to obtain 3D structures at the scale required by many real-world applications. To tackle this issue, we propose to use existing 3D molecular datasets to pre-train a model to reason about the geometry of molecules given only their 2D molecular graphs. Our method, called 3D Infomax, maximizes the mutual information between learned 3D summary vectors and the representations of a graph neural network (GNN). During fine-tuning on molecules with unknown geometry, the GNN is still able to produce implicit 3D information and uses it for downstream tasks. We show that 3D Infomax provides significant improvements for a wide range of properties, including a 22% average MAE reduction on QM9 quantum mechanical properties. Moreover, the learned representations can be effectively transferred between datasets in different molecular spaces.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-stark22a, title = {3{D} Infomax improves {GNN}s for Molecular Property Prediction}, author = {St{\"a}rk, Hannes and Beaini, Dominique and Corso, Gabriele and Tossou, Prudencio and Dallago, Christian and G{\"u}nnemann, Stephan and Li{\'o}, Pietro}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {20479--20502}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/stark22a/stark22a.pdf}, url = {https://proceedings.mlr.press/v162/stark22a.html}, abstract = {Molecular property prediction is one of the fastest-growing applications of deep learning with critical real-world impacts. Although the 3D molecular graph structure is necessary for models to achieve strong performance on many tasks, it is infeasible to obtain 3D structures at the scale required by many real-world applications. To tackle this issue, we propose to use existing 3D molecular datasets to pre-train a model to reason about the geometry of molecules given only their 2D molecular graphs. Our method, called 3D Infomax, maximizes the mutual information between learned 3D summary vectors and the representations of a graph neural network (GNN). During fine-tuning on molecules with unknown geometry, the GNN is still able to produce implicit 3D information and uses it for downstream tasks. We show that 3D Infomax provides significant improvements for a wide range of properties, including a 22% average MAE reduction on QM9 quantum mechanical properties. Moreover, the learned representations can be effectively transferred between datasets in different molecular spaces.} }
Endnote
%0 Conference Paper %T 3D Infomax improves GNNs for Molecular Property Prediction %A Hannes Stärk %A Dominique Beaini %A Gabriele Corso %A Prudencio Tossou %A Christian Dallago %A Stephan Günnemann %A Pietro Lió %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-stark22a %I PMLR %P 20479--20502 %U https://proceedings.mlr.press/v162/stark22a.html %V 162 %X Molecular property prediction is one of the fastest-growing applications of deep learning with critical real-world impacts. Although the 3D molecular graph structure is necessary for models to achieve strong performance on many tasks, it is infeasible to obtain 3D structures at the scale required by many real-world applications. To tackle this issue, we propose to use existing 3D molecular datasets to pre-train a model to reason about the geometry of molecules given only their 2D molecular graphs. Our method, called 3D Infomax, maximizes the mutual information between learned 3D summary vectors and the representations of a graph neural network (GNN). During fine-tuning on molecules with unknown geometry, the GNN is still able to produce implicit 3D information and uses it for downstream tasks. We show that 3D Infomax provides significant improvements for a wide range of properties, including a 22% average MAE reduction on QM9 quantum mechanical properties. Moreover, the learned representations can be effectively transferred between datasets in different molecular spaces.
APA
Stärk, H., Beaini, D., Corso, G., Tossou, P., Dallago, C., Günnemann, S. & Lió, P.. (2022). 3D Infomax improves GNNs for Molecular Property Prediction. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:20479-20502 Available from https://proceedings.mlr.press/v162/stark22a.html.

Related Material