Context-Guided Diffusion for Out-of-Distribution Molecular and Protein Design

Leo Klarner, Tim G. J. Rudner, Garrett M Morris, Charlotte Deane, Yee Whye Teh
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:24770-24807, 2024.

Abstract

Generative models have the potential to accelerate key steps in the discovery of novel molecular therapeutics and materials. Diffusion models have recently emerged as a powerful approach, excelling at unconditional sample generation and, with data-driven guidance, conditional generation within their training domain. Reliably sampling from high-value regions beyond the training data, however, remains an open challenge—with current methods predominantly focusing on modifying the diffusion process itself. In this paper, we develop context-guided diffusion (CGD), a simple plug-and-play method that leverages unlabeled data and smoothness constraints to improve the out-of-distribution generalization of guided diffusion models. We demonstrate that this approach leads to substantial performance gains across various settings, including continuous, discrete, and graph-structured diffusion processes with applications across drug discovery, materials science, and protein design.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-klarner24a, title = {Context-Guided Diffusion for Out-of-Distribution Molecular and Protein Design}, author = {Klarner, Leo and Rudner, Tim G. J. and Morris, Garrett M and Deane, Charlotte and Teh, Yee Whye}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {24770--24807}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/klarner24a/klarner24a.pdf}, url = {https://proceedings.mlr.press/v235/klarner24a.html}, abstract = {Generative models have the potential to accelerate key steps in the discovery of novel molecular therapeutics and materials. Diffusion models have recently emerged as a powerful approach, excelling at unconditional sample generation and, with data-driven guidance, conditional generation within their training domain. Reliably sampling from high-value regions beyond the training data, however, remains an open challenge—with current methods predominantly focusing on modifying the diffusion process itself. In this paper, we develop context-guided diffusion (CGD), a simple plug-and-play method that leverages unlabeled data and smoothness constraints to improve the out-of-distribution generalization of guided diffusion models. We demonstrate that this approach leads to substantial performance gains across various settings, including continuous, discrete, and graph-structured diffusion processes with applications across drug discovery, materials science, and protein design.} }
Endnote
%0 Conference Paper %T Context-Guided Diffusion for Out-of-Distribution Molecular and Protein Design %A Leo Klarner %A Tim G. J. Rudner %A Garrett M Morris %A Charlotte Deane %A Yee Whye Teh %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-klarner24a %I PMLR %P 24770--24807 %U https://proceedings.mlr.press/v235/klarner24a.html %V 235 %X Generative models have the potential to accelerate key steps in the discovery of novel molecular therapeutics and materials. Diffusion models have recently emerged as a powerful approach, excelling at unconditional sample generation and, with data-driven guidance, conditional generation within their training domain. Reliably sampling from high-value regions beyond the training data, however, remains an open challenge—with current methods predominantly focusing on modifying the diffusion process itself. In this paper, we develop context-guided diffusion (CGD), a simple plug-and-play method that leverages unlabeled data and smoothness constraints to improve the out-of-distribution generalization of guided diffusion models. We demonstrate that this approach leads to substantial performance gains across various settings, including continuous, discrete, and graph-structured diffusion processes with applications across drug discovery, materials science, and protein design.
APA
Klarner, L., Rudner, T.G.J., Morris, G.M., Deane, C. & Teh, Y.W.. (2024). Context-Guided Diffusion for Out-of-Distribution Molecular and Protein Design. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:24770-24807 Available from https://proceedings.mlr.press/v235/klarner24a.html.

Related Material