Hierarchical Grammar-Induced Geometry for Data-Efficient Molecular Property Prediction

Minghao Guo, Veronika Thost, Samuel W Song, Adithya Balachandran, Payel Das, Jie Chen, Wojciech Matusik
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:12055-12076, 2023.

Abstract

The prediction of molecular properties is a crucial task in the field of material and drug discovery. The potential benefits of using deep learning techniques are reflected in the wealth of recent literature. Still, these techniques are faced with a common challenge in practice: Labeled data are limited by the cost of manual extraction from literature and laborious experimentation. In this work, we propose a data-efficient property predictor by utilizing a learnable hierarchical molecular grammar that can generate molecules from grammar production rules. Such a grammar induces an explicit geometry of the space of molecular graphs, which provides an informative prior on molecular structural similarity. The property prediction is performed using graph neural diffusion over the grammar-induced geometry. On both small and large datasets, our evaluation shows that this approach outperforms a wide spectrum of baselines, including supervised and pre-trained graph neural networks. We include a detailed ablation study and further analysis of our solution, showing its effectiveness in cases with extremely limited data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-guo23h, title = {Hierarchical Grammar-Induced Geometry for Data-Efficient Molecular Property Prediction}, author = {Guo, Minghao and Thost, Veronika and Song, Samuel W and Balachandran, Adithya and Das, Payel and Chen, Jie and Matusik, Wojciech}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {12055--12076}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/guo23h/guo23h.pdf}, url = {https://proceedings.mlr.press/v202/guo23h.html}, abstract = {The prediction of molecular properties is a crucial task in the field of material and drug discovery. The potential benefits of using deep learning techniques are reflected in the wealth of recent literature. Still, these techniques are faced with a common challenge in practice: Labeled data are limited by the cost of manual extraction from literature and laborious experimentation. In this work, we propose a data-efficient property predictor by utilizing a learnable hierarchical molecular grammar that can generate molecules from grammar production rules. Such a grammar induces an explicit geometry of the space of molecular graphs, which provides an informative prior on molecular structural similarity. The property prediction is performed using graph neural diffusion over the grammar-induced geometry. On both small and large datasets, our evaluation shows that this approach outperforms a wide spectrum of baselines, including supervised and pre-trained graph neural networks. We include a detailed ablation study and further analysis of our solution, showing its effectiveness in cases with extremely limited data.} }
Endnote
%0 Conference Paper %T Hierarchical Grammar-Induced Geometry for Data-Efficient Molecular Property Prediction %A Minghao Guo %A Veronika Thost %A Samuel W Song %A Adithya Balachandran %A Payel Das %A Jie Chen %A Wojciech Matusik %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-guo23h %I PMLR %P 12055--12076 %U https://proceedings.mlr.press/v202/guo23h.html %V 202 %X The prediction of molecular properties is a crucial task in the field of material and drug discovery. The potential benefits of using deep learning techniques are reflected in the wealth of recent literature. Still, these techniques are faced with a common challenge in practice: Labeled data are limited by the cost of manual extraction from literature and laborious experimentation. In this work, we propose a data-efficient property predictor by utilizing a learnable hierarchical molecular grammar that can generate molecules from grammar production rules. Such a grammar induces an explicit geometry of the space of molecular graphs, which provides an informative prior on molecular structural similarity. The property prediction is performed using graph neural diffusion over the grammar-induced geometry. On both small and large datasets, our evaluation shows that this approach outperforms a wide spectrum of baselines, including supervised and pre-trained graph neural networks. We include a detailed ablation study and further analysis of our solution, showing its effectiveness in cases with extremely limited data.
APA
Guo, M., Thost, V., Song, S.W., Balachandran, A., Das, P., Chen, J. & Matusik, W.. (2023). Hierarchical Grammar-Induced Geometry for Data-Efficient Molecular Property Prediction. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:12055-12076 Available from https://proceedings.mlr.press/v202/guo23h.html.

Related Material