BounDr.E: Predicting Drug-likeness via Biomedical Knowledge Alignment and EM-like One-Class Boundary Optimization

Dongmin Bang, Inyoung Sung, Yinhua Piao, Sangseon Lee, Sun Kim
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:2858-2893, 2025.

Abstract

The advent of generative AI now enables large-scale $\textit{de novo}$ design of molecules, but identifying viable drug candidates among them remains an open problem. Existing drug-likeness prediction methods often rely on ambiguous negative sets or purely structural features, limiting their ability to accurately classify drugs from non-drugs. In this work, we introduce BounDr.E: a novel modeling of drug-likeness as a compact space surrounding approved drugs through a dynamic one-class boundary approach. Specifically, we enrich the chemical space through biomedical knowledge alignment, and then iteratively tighten the drug-like boundary by pushing non-drug-like compounds outside via an Expectation-Maximization (EM)-like process. Empirically, BounDr.E achieves 10% F1-score improvement over the previous state-of-the-art and demonstrates robust cross-dataset performance, including zero-shot toxic compound filtering. Additionally, we showcase its effectiveness through comprehensive case studies in large-scale $\textit{in silico}$ screening. Our codes and constructed benchmark data under various schemes are provided at: https://github.com/eugenebang/boundr_e.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-bang25a, title = {{B}oun{D}r.{E}: Predicting Drug-likeness via Biomedical Knowledge Alignment and {EM}-like One-Class Boundary Optimization}, author = {Bang, Dongmin and Sung, Inyoung and Piao, Yinhua and Lee, Sangseon and Kim, Sun}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {2858--2893}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/bang25a/bang25a.pdf}, url = {https://proceedings.mlr.press/v267/bang25a.html}, abstract = {The advent of generative AI now enables large-scale $\textit{de novo}$ design of molecules, but identifying viable drug candidates among them remains an open problem. Existing drug-likeness prediction methods often rely on ambiguous negative sets or purely structural features, limiting their ability to accurately classify drugs from non-drugs. In this work, we introduce BounDr.E: a novel modeling of drug-likeness as a compact space surrounding approved drugs through a dynamic one-class boundary approach. Specifically, we enrich the chemical space through biomedical knowledge alignment, and then iteratively tighten the drug-like boundary by pushing non-drug-like compounds outside via an Expectation-Maximization (EM)-like process. Empirically, BounDr.E achieves 10% F1-score improvement over the previous state-of-the-art and demonstrates robust cross-dataset performance, including zero-shot toxic compound filtering. Additionally, we showcase its effectiveness through comprehensive case studies in large-scale $\textit{in silico}$ screening. Our codes and constructed benchmark data under various schemes are provided at: https://github.com/eugenebang/boundr_e.} }
Endnote
%0 Conference Paper %T BounDr.E: Predicting Drug-likeness via Biomedical Knowledge Alignment and EM-like One-Class Boundary Optimization %A Dongmin Bang %A Inyoung Sung %A Yinhua Piao %A Sangseon Lee %A Sun Kim %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-bang25a %I PMLR %P 2858--2893 %U https://proceedings.mlr.press/v267/bang25a.html %V 267 %X The advent of generative AI now enables large-scale $\textit{de novo}$ design of molecules, but identifying viable drug candidates among them remains an open problem. Existing drug-likeness prediction methods often rely on ambiguous negative sets or purely structural features, limiting their ability to accurately classify drugs from non-drugs. In this work, we introduce BounDr.E: a novel modeling of drug-likeness as a compact space surrounding approved drugs through a dynamic one-class boundary approach. Specifically, we enrich the chemical space through biomedical knowledge alignment, and then iteratively tighten the drug-like boundary by pushing non-drug-like compounds outside via an Expectation-Maximization (EM)-like process. Empirically, BounDr.E achieves 10% F1-score improvement over the previous state-of-the-art and demonstrates robust cross-dataset performance, including zero-shot toxic compound filtering. Additionally, we showcase its effectiveness through comprehensive case studies in large-scale $\textit{in silico}$ screening. Our codes and constructed benchmark data under various schemes are provided at: https://github.com/eugenebang/boundr_e.
APA
Bang, D., Sung, I., Piao, Y., Lee, S. & Kim, S.. (2025). BounDr.E: Predicting Drug-likeness via Biomedical Knowledge Alignment and EM-like One-Class Boundary Optimization. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:2858-2893 Available from https://proceedings.mlr.press/v267/bang25a.html.

Related Material