Offline Model-based Optimization for Real-World Molecular Discovery

Dong-Hee Shin, Young-Han Son, Hyun Jung Lee, Deok-Joong Lee, Tae-Eui Kam
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:55205-55254, 2025.

Abstract

Molecular discovery has attracted significant attention in scientific fields for its ability to generate novel molecules with desirable properties. Although numerous methods have been developed to tackle this problem, most rely on an online setting that requires repeated online evaluation of candidate molecules using the oracle. However, in real-world molecular discovery, the oracle is often represented by wet-lab experiments, making this online setting impractical due to the significant time and resource demands. To fill this gap, we propose the Molecular Stitching (MolStitch) framework, which utilizes a fixed offline dataset to explore and optimize molecules without the need for repeated oracle evaluations. Specifically, MolStitch leverages existing molecules from the offline dataset to generate novel ‘stitched molecules’ that combine their desirable properties. These stitched molecules are then used as training samples to fine-tune the generative model using preference optimization techniques. Experimental results on various offline multi-objective molecular optimization problems validate the effectiveness of MolStitch. The source code is available online.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-shin25g, title = {Offline Model-based Optimization for Real-World Molecular Discovery}, author = {Shin, Dong-Hee and Son, Young-Han and Lee, Hyun Jung and Lee, Deok-Joong and Kam, Tae-Eui}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {55205--55254}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/shin25g/shin25g.pdf}, url = {https://proceedings.mlr.press/v267/shin25g.html}, abstract = {Molecular discovery has attracted significant attention in scientific fields for its ability to generate novel molecules with desirable properties. Although numerous methods have been developed to tackle this problem, most rely on an online setting that requires repeated online evaluation of candidate molecules using the oracle. However, in real-world molecular discovery, the oracle is often represented by wet-lab experiments, making this online setting impractical due to the significant time and resource demands. To fill this gap, we propose the Molecular Stitching (MolStitch) framework, which utilizes a fixed offline dataset to explore and optimize molecules without the need for repeated oracle evaluations. Specifically, MolStitch leverages existing molecules from the offline dataset to generate novel ‘stitched molecules’ that combine their desirable properties. These stitched molecules are then used as training samples to fine-tune the generative model using preference optimization techniques. Experimental results on various offline multi-objective molecular optimization problems validate the effectiveness of MolStitch. The source code is available online.} }
Endnote
%0 Conference Paper %T Offline Model-based Optimization for Real-World Molecular Discovery %A Dong-Hee Shin %A Young-Han Son %A Hyun Jung Lee %A Deok-Joong Lee %A Tae-Eui Kam %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-shin25g %I PMLR %P 55205--55254 %U https://proceedings.mlr.press/v267/shin25g.html %V 267 %X Molecular discovery has attracted significant attention in scientific fields for its ability to generate novel molecules with desirable properties. Although numerous methods have been developed to tackle this problem, most rely on an online setting that requires repeated online evaluation of candidate molecules using the oracle. However, in real-world molecular discovery, the oracle is often represented by wet-lab experiments, making this online setting impractical due to the significant time and resource demands. To fill this gap, we propose the Molecular Stitching (MolStitch) framework, which utilizes a fixed offline dataset to explore and optimize molecules without the need for repeated oracle evaluations. Specifically, MolStitch leverages existing molecules from the offline dataset to generate novel ‘stitched molecules’ that combine their desirable properties. These stitched molecules are then used as training samples to fine-tune the generative model using preference optimization techniques. Experimental results on various offline multi-objective molecular optimization problems validate the effectiveness of MolStitch. The source code is available online.
APA
Shin, D., Son, Y., Lee, H.J., Lee, D. & Kam, T.. (2025). Offline Model-based Optimization for Real-World Molecular Discovery. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:55205-55254 Available from https://proceedings.mlr.press/v267/shin25g.html.

Related Material