ScaffoldGPT: A Scaffold-based GPT Model for Drug Optimization

Xuefeng Liu, Songhao Jiang, Ian Foster, Jinbo Xu, Rick L. Stevens
Proceedings of the 10th Machine Learning for Healthcare Conference, PMLR 298, 2025.

Abstract

Drug optimization has become increasingly crucial in light of fast-mutating virus strains and drug-resistant cancer cells. Nevertheless, it remains challenging as it necessitates retaining the beneficial properties of the original drug while simultaneously enhancing desired attributes beyond its scope. In this work, we aim to tackle this challenge by introducing ScaffoldGPT, a novel Generative Pretrained Transformer (GPT) designed for drug optimization based on molecular scaffolds. Our work comprises three key components: (1) A three-stage drug optimization approach that integrates pretraining, finetuning, and decoding optimization. (2) A uniquely designed two-phase incremental training approach for pre-training the drug optimization GPT on molecule scaffold with enhanced performance. (3) A token-level decoding optimization strategy, Top-N, that enabling controlled, reward guided generation using pretrained/finetuned GPT. We demonstrate via a comprehensive evaluation on COVID and cancer benchmarks that ScaffoldGPT outperforms the competing baselines in drug optimization benchmarks, while excelling in preserving original functional scaffold and enhancing desired properties.

Cite this Paper


BibTeX
@InProceedings{pmlr-v298-liu25b, title = {Scaffold{GPT}: A Scaffold-based {GPT} Model for Drug Optimization}, author = {Liu, Xuefeng and Jiang, Songhao and Foster, Ian and Xu, Jinbo and Stevens, Rick L.}, booktitle = {Proceedings of the 10th Machine Learning for Healthcare Conference}, year = {2025}, editor = {Agrawal, Monica and Deshpande, Kaivalya and Engelhard, Matthew and Joshi, Shalmali and Tang, Shengpu and Urteaga, Iñigo}, volume = {298}, series = {Proceedings of Machine Learning Research}, month = {15--16 Aug}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v298/main/assets/liu25b/liu25b.pdf}, url = {https://proceedings.mlr.press/v298/liu25b.html}, abstract = {Drug optimization has become increasingly crucial in light of fast-mutating virus strains and drug-resistant cancer cells. Nevertheless, it remains challenging as it necessitates retaining the beneficial properties of the original drug while simultaneously enhancing desired attributes beyond its scope. In this work, we aim to tackle this challenge by introducing ScaffoldGPT, a novel Generative Pretrained Transformer (GPT) designed for drug optimization based on molecular scaffolds. Our work comprises three key components: (1) A three-stage drug optimization approach that integrates pretraining, finetuning, and decoding optimization. (2) A uniquely designed two-phase incremental training approach for pre-training the drug optimization GPT on molecule scaffold with enhanced performance. (3) A token-level decoding optimization strategy, Top-N, that enabling controlled, reward guided generation using pretrained/finetuned GPT. We demonstrate via a comprehensive evaluation on COVID and cancer benchmarks that ScaffoldGPT outperforms the competing baselines in drug optimization benchmarks, while excelling in preserving original functional scaffold and enhancing desired properties.} }
Endnote
%0 Conference Paper %T ScaffoldGPT: A Scaffold-based GPT Model for Drug Optimization %A Xuefeng Liu %A Songhao Jiang %A Ian Foster %A Jinbo Xu %A Rick L. Stevens %B Proceedings of the 10th Machine Learning for Healthcare Conference %C Proceedings of Machine Learning Research %D 2025 %E Monica Agrawal %E Kaivalya Deshpande %E Matthew Engelhard %E Shalmali Joshi %E Shengpu Tang %E Iñigo Urteaga %F pmlr-v298-liu25b %I PMLR %U https://proceedings.mlr.press/v298/liu25b.html %V 298 %X Drug optimization has become increasingly crucial in light of fast-mutating virus strains and drug-resistant cancer cells. Nevertheless, it remains challenging as it necessitates retaining the beneficial properties of the original drug while simultaneously enhancing desired attributes beyond its scope. In this work, we aim to tackle this challenge by introducing ScaffoldGPT, a novel Generative Pretrained Transformer (GPT) designed for drug optimization based on molecular scaffolds. Our work comprises three key components: (1) A three-stage drug optimization approach that integrates pretraining, finetuning, and decoding optimization. (2) A uniquely designed two-phase incremental training approach for pre-training the drug optimization GPT on molecule scaffold with enhanced performance. (3) A token-level decoding optimization strategy, Top-N, that enabling controlled, reward guided generation using pretrained/finetuned GPT. We demonstrate via a comprehensive evaluation on COVID and cancer benchmarks that ScaffoldGPT outperforms the competing baselines in drug optimization benchmarks, while excelling in preserving original functional scaffold and enhancing desired properties.
APA
Liu, X., Jiang, S., Foster, I., Xu, J. & Stevens, R.L.. (2025). ScaffoldGPT: A Scaffold-based GPT Model for Drug Optimization. Proceedings of the 10th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 298 Available from https://proceedings.mlr.press/v298/liu25b.html.

Related Material