What Ails Generative Structure-based Drug Design: Expressivity is Too Little or Too Much?

Rafal Karczewski, Samuel Kaski, Markus Heinonen, Vikas K Garg
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:3187-3195, 2025.

Abstract

Several generative models with elaborate training and sampling procedures have been proposed to accelerate structure-based drug design (SBDD); however, their empirical performance turns out to be suboptimal. We seek to better understand this phenomenon from both theoretical and empirical perspectives. Since most of these models apply graph neural networks (GNNs), one may suspect that they inherit the representational limitations of GNNs. We analyze this aspect, establishing the first such results for protein-ligand complexes. A plausible counterview may attribute the underperformance of these models to their excessive parameterizations, inducing expressivity at the expense of generalization. We investigate this possibility with a simple metric-aware approach that learns an economical surrogate for affinity to infer an unlabelled molecular graph and optimizes for labels conditioned on this graph and molecular properties. The resulting model achieves state-of-the-art results using 100x fewer trainable parameters and affords up to 1000x speedup. Collectively, our findings underscore the need to reassess and redirect the existing paradigm and efforts for SBDD. Code is available at \url{https://github.com/rafalkarczewski/SimpleSBDD.}

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-karczewski25a, title = {What Ails Generative Structure-based Drug Design: Expressivity is Too Little or Too Much?}, author = {Karczewski, Rafal and Kaski, Samuel and Heinonen, Markus and Garg, Vikas K}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {3187--3195}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/karczewski25a/karczewski25a.pdf}, url = {https://proceedings.mlr.press/v258/karczewski25a.html}, abstract = {Several generative models with elaborate training and sampling procedures have been proposed to accelerate structure-based drug design (SBDD); however, their empirical performance turns out to be suboptimal. We seek to better understand this phenomenon from both theoretical and empirical perspectives. Since most of these models apply graph neural networks (GNNs), one may suspect that they inherit the representational limitations of GNNs. We analyze this aspect, establishing the first such results for protein-ligand complexes. A plausible counterview may attribute the underperformance of these models to their excessive parameterizations, inducing expressivity at the expense of generalization. We investigate this possibility with a simple metric-aware approach that learns an economical surrogate for affinity to infer an unlabelled molecular graph and optimizes for labels conditioned on this graph and molecular properties. The resulting model achieves state-of-the-art results using 100x fewer trainable parameters and affords up to 1000x speedup. Collectively, our findings underscore the need to reassess and redirect the existing paradigm and efforts for SBDD. Code is available at \url{https://github.com/rafalkarczewski/SimpleSBDD.}} }
Endnote
%0 Conference Paper %T What Ails Generative Structure-based Drug Design: Expressivity is Too Little or Too Much? %A Rafal Karczewski %A Samuel Kaski %A Markus Heinonen %A Vikas K Garg %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-karczewski25a %I PMLR %P 3187--3195 %U https://proceedings.mlr.press/v258/karczewski25a.html %V 258 %X Several generative models with elaborate training and sampling procedures have been proposed to accelerate structure-based drug design (SBDD); however, their empirical performance turns out to be suboptimal. We seek to better understand this phenomenon from both theoretical and empirical perspectives. Since most of these models apply graph neural networks (GNNs), one may suspect that they inherit the representational limitations of GNNs. We analyze this aspect, establishing the first such results for protein-ligand complexes. A plausible counterview may attribute the underperformance of these models to their excessive parameterizations, inducing expressivity at the expense of generalization. We investigate this possibility with a simple metric-aware approach that learns an economical surrogate for affinity to infer an unlabelled molecular graph and optimizes for labels conditioned on this graph and molecular properties. The resulting model achieves state-of-the-art results using 100x fewer trainable parameters and affords up to 1000x speedup. Collectively, our findings underscore the need to reassess and redirect the existing paradigm and efforts for SBDD. Code is available at \url{https://github.com/rafalkarczewski/SimpleSBDD.}
APA
Karczewski, R., Kaski, S., Heinonen, M. & Garg, V.K.. (2025). What Ails Generative Structure-based Drug Design: Expressivity is Too Little or Too Much?. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:3187-3195 Available from https://proceedings.mlr.press/v258/karczewski25a.html.

Related Material