Towards Physically Reliable Molecular Representation Learning

Seunghoon Yi, Youngwoo Cho, Jinhwan Sul, Seung Woo Ko, Soo Kyung Kim, Jaegul Choo, Hongkee Yoon, Joonseok Lee
Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:2433-2443, 2023.

Abstract

Estimating the energetic properties of molecular systems is a critical task in material design. Machine learning has shown remarkable promise on this task over classical force fields, but a fully data-driven approach suffers from limited labeled data; not just the amount of available data lacks, but the distribution of labeled examples is highly skewed to stable states. In this work, we propose a molecular representation learning method that extrapolates well beyond the training distribution, powered by physics-driven parameter estimation from classical energy equations and self-supervised learning inspired from masked language modeling. To ensure reliability of the proposed model, we introduce a series of novel evaluation schemes in multifaceted ways, beyond the energy or force accuracy that has been dominantly used. From extensive experiments, we demonstrate that the proposed method is effective in discovering molecular structures, outperforming other baselines. Furthermore, we extrapolate it to the chemical reaction pathways beyond stable states, taking a step towards physically reliable molecular representation learning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v216-yi23a, title = {Towards Physically Reliable Molecular Representation Learning}, author = {Yi, Seunghoon and Cho, Youngwoo and Sul, Jinhwan and Ko, Seung Woo and Kim, Soo Kyung and Choo, Jaegul and Yoon, Hongkee and Lee, Joonseok}, booktitle = {Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence}, pages = {2433--2443}, year = {2023}, editor = {Evans, Robin J. and Shpitser, Ilya}, volume = {216}, series = {Proceedings of Machine Learning Research}, month = {31 Jul--04 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v216/yi23a/yi23a.pdf}, url = {https://proceedings.mlr.press/v216/yi23a.html}, abstract = {Estimating the energetic properties of molecular systems is a critical task in material design. Machine learning has shown remarkable promise on this task over classical force fields, but a fully data-driven approach suffers from limited labeled data; not just the amount of available data lacks, but the distribution of labeled examples is highly skewed to stable states. In this work, we propose a molecular representation learning method that extrapolates well beyond the training distribution, powered by physics-driven parameter estimation from classical energy equations and self-supervised learning inspired from masked language modeling. To ensure reliability of the proposed model, we introduce a series of novel evaluation schemes in multifaceted ways, beyond the energy or force accuracy that has been dominantly used. From extensive experiments, we demonstrate that the proposed method is effective in discovering molecular structures, outperforming other baselines. Furthermore, we extrapolate it to the chemical reaction pathways beyond stable states, taking a step towards physically reliable molecular representation learning.} }
Endnote
%0 Conference Paper %T Towards Physically Reliable Molecular Representation Learning %A Seunghoon Yi %A Youngwoo Cho %A Jinhwan Sul %A Seung Woo Ko %A Soo Kyung Kim %A Jaegul Choo %A Hongkee Yoon %A Joonseok Lee %B Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2023 %E Robin J. Evans %E Ilya Shpitser %F pmlr-v216-yi23a %I PMLR %P 2433--2443 %U https://proceedings.mlr.press/v216/yi23a.html %V 216 %X Estimating the energetic properties of molecular systems is a critical task in material design. Machine learning has shown remarkable promise on this task over classical force fields, but a fully data-driven approach suffers from limited labeled data; not just the amount of available data lacks, but the distribution of labeled examples is highly skewed to stable states. In this work, we propose a molecular representation learning method that extrapolates well beyond the training distribution, powered by physics-driven parameter estimation from classical energy equations and self-supervised learning inspired from masked language modeling. To ensure reliability of the proposed model, we introduce a series of novel evaluation schemes in multifaceted ways, beyond the energy or force accuracy that has been dominantly used. From extensive experiments, we demonstrate that the proposed method is effective in discovering molecular structures, outperforming other baselines. Furthermore, we extrapolate it to the chemical reaction pathways beyond stable states, taking a step towards physically reliable molecular representation learning.
APA
Yi, S., Cho, Y., Sul, J., Ko, S.W., Kim, S.K., Choo, J., Yoon, H. & Lee, J.. (2023). Towards Physically Reliable Molecular Representation Learning. Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 216:2433-2443 Available from https://proceedings.mlr.press/v216/yi23a.html.

Related Material