Representing Molecules as Random Walks Over Interpretable Grammars

Michael Sun, Minghao Guo, Weize Yuan, Veronika Thost, Crystal Elaine Owens, Aristotle Franklin Grosz, Sharvaa Selvan, Katelyn Zhou, Hassan Mohiuddin, Benjamin J Pedretti, Zachary P Smith, Jie Chen, Wojciech Matusik
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:46988-47016, 2024.

Abstract

Recent research in molecular discovery has primarily been devoted to small, drug-like molecules, leaving many similarly important applications in material design without adequate technology. These applications often rely on more complex molecular structures with fewer examples that are carefully designed using known substructures. We propose a data-efficient and interpretable model for representing and reasoning over such molecules in terms of graph grammars that explicitly describe the hierarchical design space featuring motifs to be the design basis. We present a novel representation in the form of random walks over the design space, which facilitates both molecule generation and property prediction. We demonstrate clear advantages over existing methods in terms of performance, efficiency, and synthesizability of predicted molecules, and we provide detailed insights into the method’s chemical interpretability.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-sun24c, title = {Representing Molecules as Random Walks Over Interpretable Grammars}, author = {Sun, Michael and Guo, Minghao and Yuan, Weize and Thost, Veronika and Owens, Crystal Elaine and Grosz, Aristotle Franklin and Selvan, Sharvaa and Zhou, Katelyn and Mohiuddin, Hassan and Pedretti, Benjamin J and Smith, Zachary P and Chen, Jie and Matusik, Wojciech}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {46988--47016}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/sun24c/sun24c.pdf}, url = {https://proceedings.mlr.press/v235/sun24c.html}, abstract = {Recent research in molecular discovery has primarily been devoted to small, drug-like molecules, leaving many similarly important applications in material design without adequate technology. These applications often rely on more complex molecular structures with fewer examples that are carefully designed using known substructures. We propose a data-efficient and interpretable model for representing and reasoning over such molecules in terms of graph grammars that explicitly describe the hierarchical design space featuring motifs to be the design basis. We present a novel representation in the form of random walks over the design space, which facilitates both molecule generation and property prediction. We demonstrate clear advantages over existing methods in terms of performance, efficiency, and synthesizability of predicted molecules, and we provide detailed insights into the method’s chemical interpretability.} }
Endnote
%0 Conference Paper %T Representing Molecules as Random Walks Over Interpretable Grammars %A Michael Sun %A Minghao Guo %A Weize Yuan %A Veronika Thost %A Crystal Elaine Owens %A Aristotle Franklin Grosz %A Sharvaa Selvan %A Katelyn Zhou %A Hassan Mohiuddin %A Benjamin J Pedretti %A Zachary P Smith %A Jie Chen %A Wojciech Matusik %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-sun24c %I PMLR %P 46988--47016 %U https://proceedings.mlr.press/v235/sun24c.html %V 235 %X Recent research in molecular discovery has primarily been devoted to small, drug-like molecules, leaving many similarly important applications in material design without adequate technology. These applications often rely on more complex molecular structures with fewer examples that are carefully designed using known substructures. We propose a data-efficient and interpretable model for representing and reasoning over such molecules in terms of graph grammars that explicitly describe the hierarchical design space featuring motifs to be the design basis. We present a novel representation in the form of random walks over the design space, which facilitates both molecule generation and property prediction. We demonstrate clear advantages over existing methods in terms of performance, efficiency, and synthesizability of predicted molecules, and we provide detailed insights into the method’s chemical interpretability.
APA
Sun, M., Guo, M., Yuan, W., Thost, V., Owens, C.E., Grosz, A.F., Selvan, S., Zhou, K., Mohiuddin, H., Pedretti, B.J., Smith, Z.P., Chen, J. & Matusik, W.. (2024). Representing Molecules as Random Walks Over Interpretable Grammars. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:46988-47016 Available from https://proceedings.mlr.press/v235/sun24c.html.

Related Material