Multi-Objective Molecule Generation using Interpretable Substructures

Wengong Jin, Dr.Regina Barzilay, Tommi Jaakkola
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:4849-4859, 2020.

Abstract

Drug discovery aims to find novel compounds with specified chemical property profiles. In terms of generative modeling, the goal is to learn to sample molecules in the intersection of multiple property constraints. This task becomes increasingly challenging when there are many property constraints. We propose to offset this complexity by composing molecules from a vocabulary of substructures that we call molecular rationales. These rationales are identified from molecules as substructures that are likely responsible for each property of interest. We then learn to expand rationales into a full molecule using graph generative models. Our final generative model composes molecules as mixtures of multiple rationale completions, and this mixture is fine-tuned to preserve the properties of interest. We evaluate our model on various drug design tasks and demonstrate significant improvements over state-of-the-art baselines in terms of accuracy, diversity, and novelty of generated compounds.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-jin20b, title = {Multi-Objective Molecule Generation using Interpretable Substructures}, author = {Jin, Wengong and Barzilay, Dr.Regina and Jaakkola, Tommi}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {4849--4859}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/jin20b/jin20b.pdf}, url = {http://proceedings.mlr.press/v119/jin20b.html}, abstract = {Drug discovery aims to find novel compounds with specified chemical property profiles. In terms of generative modeling, the goal is to learn to sample molecules in the intersection of multiple property constraints. This task becomes increasingly challenging when there are many property constraints. We propose to offset this complexity by composing molecules from a vocabulary of substructures that we call molecular rationales. These rationales are identified from molecules as substructures that are likely responsible for each property of interest. We then learn to expand rationales into a full molecule using graph generative models. Our final generative model composes molecules as mixtures of multiple rationale completions, and this mixture is fine-tuned to preserve the properties of interest. We evaluate our model on various drug design tasks and demonstrate significant improvements over state-of-the-art baselines in terms of accuracy, diversity, and novelty of generated compounds.} }
Endnote
%0 Conference Paper %T Multi-Objective Molecule Generation using Interpretable Substructures %A Wengong Jin %A Dr.Regina Barzilay %A Tommi Jaakkola %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-jin20b %I PMLR %P 4849--4859 %U http://proceedings.mlr.press/v119/jin20b.html %V 119 %X Drug discovery aims to find novel compounds with specified chemical property profiles. In terms of generative modeling, the goal is to learn to sample molecules in the intersection of multiple property constraints. This task becomes increasingly challenging when there are many property constraints. We propose to offset this complexity by composing molecules from a vocabulary of substructures that we call molecular rationales. These rationales are identified from molecules as substructures that are likely responsible for each property of interest. We then learn to expand rationales into a full molecule using graph generative models. Our final generative model composes molecules as mixtures of multiple rationale completions, and this mixture is fine-tuned to preserve the properties of interest. We evaluate our model on various drug design tasks and demonstrate significant improvements over state-of-the-art baselines in terms of accuracy, diversity, and novelty of generated compounds.
APA
Jin, W., Barzilay, D. & Jaakkola, T.. (2020). Multi-Objective Molecule Generation using Interpretable Substructures. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:4849-4859 Available from http://proceedings.mlr.press/v119/jin20b.html.

Related Material