Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design

Masatoshi Uehara; Xingyu Su; Yulai Zhao; Xiner Li; Aviv Regev; Shuiwang Ji; Sergey Levine; Tommaso Biancalani

Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design

Masatoshi Uehara, Xingyu Su, Yulai Zhao, Xiner Li, Aviv Regev, Shuiwang Ji, Sergey Levine, Tommaso Biancalani

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:60515-60529, 2025.

Abstract

To fully leverage the capabilities of diffusion models, we are often interested in optimizing downstream reward functions during inference. While numerous algorithms for reward-guided generation have been recently proposed due to their significance, current approaches predominantly focus on single-shot generation, transitioning from fully noised to denoised states. We propose a novel framework for inference-time reward optimization with diffusion models. Our approach employs an iterative refinement process consisting of two steps in each iteration: noising and reward-guided denoising. This sequential refinement allows for the gradual correction of errors introduced during reward optimization. Finally, we provide a theoretical guarantee for our framework. Finally, we demonstrate its superior empirical performance in protein and DNA design.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-uehara25a,
  title = 	 {Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and {DNA} Design},
  author =       {Uehara, Masatoshi and Su, Xingyu and Zhao, Yulai and Li, Xiner and Regev, Aviv and Ji, Shuiwang and Levine, Sergey and Biancalani, Tommaso},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {60515--60529},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/uehara25a/uehara25a.pdf},
  url = 	 {https://proceedings.mlr.press/v267/uehara25a.html},
  abstract = 	 {To fully leverage the capabilities of diffusion models, we are often interested in optimizing downstream reward functions during inference. While numerous algorithms for reward-guided generation have been recently proposed due to their significance, current approaches predominantly focus on single-shot generation, transitioning from fully noised to denoised states. We propose a novel framework for inference-time reward optimization with diffusion models. Our approach employs an iterative refinement process consisting of two steps in each iteration: noising and reward-guided denoising. This sequential refinement allows for the gradual correction of errors introduced during reward optimization. Finally, we provide a theoretical guarantee for our framework. Finally, we demonstrate its superior empirical performance in protein and DNA design.}
}

Endnote

%0 Conference Paper
%T Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design
%A Masatoshi Uehara
%A Xingyu Su
%A Yulai Zhao
%A Xiner Li
%A Aviv Regev
%A Shuiwang Ji
%A Sergey Levine
%A Tommaso Biancalani
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-uehara25a
%I PMLR
%P 60515--60529
%U https://proceedings.mlr.press/v267/uehara25a.html
%V 267
%X To fully leverage the capabilities of diffusion models, we are often interested in optimizing downstream reward functions during inference. While numerous algorithms for reward-guided generation have been recently proposed due to their significance, current approaches predominantly focus on single-shot generation, transitioning from fully noised to denoised states. We propose a novel framework for inference-time reward optimization with diffusion models. Our approach employs an iterative refinement process consisting of two steps in each iteration: noising and reward-guided denoising. This sequential refinement allows for the gradual correction of errors introduced during reward optimization. Finally, we provide a theoretical guarantee for our framework. Finally, we demonstrate its superior empirical performance in protein and DNA design.

APA

Uehara, M., Su, X., Zhao, Y., Li, X., Regev, A., Ji, S., Levine, S. & Biancalani, T.. (2025). Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:60515-60529 Available from https://proceedings.mlr.press/v267/uehara25a.html.

Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design

Abstract

Cite this Paper

Related Material