Steering diffusion models with quadratic rewards: a fine-grained analysis

Ankur Moitra, Andrej Risteski, Dhruv Rohatgi
Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:5188-5209, 2026.

Abstract

Inference-time algorithms are an emerging paradigm in which pre-trained models are used as subroutines to solve downstream tasks. Such algorithms have been proposed for tasks ranging from inverse problems and guided image generation to reasoning. However, the methods currently deployed in practice are heuristics with a variety of failure modes—and we have very little understanding of when these heuristics can be efficiently improved. In this paper, we consider the task of sampling from a reward-tilted diffusion model—that is, sampling from $p^{\star}(x) \propto p(x) \exp(r(x))$—given a reward function $r$ and pre-trained diffusion oracle for $p$. We provide a fine-grained analysis of the computational tractability of this task for quadratic rewards $r(x) = x^\top A x + b^\top x$. We show that linear-reward tilts are always efficiently sampleable—a simple result that seems to have gone unnoticed in the literature. We use this as a building block, along with a conceptually new ingredient—the Hubbard-Stratonovich transform—to provide an efficient algorithm for sampling from low-rank positive-definite quadratic tilts, i.e. $r(x) = x^\top A x$ where $A$ is positive-definite and of rank $O(1)$. For negative-definite tilts, i.e. $r(x) = - x^\top A x$ where $A$ is positive-definite, we prove that the problem is intractable even if $A$ is of rank 1 (albeit with exponentially-large entries).

Cite this Paper


BibTeX
@InProceedings{pmlr-v336-moitra26a, title = {Steering diffusion models with quadratic rewards: a fine-grained analysis}, author = {Moitra, Ankur and Risteski, Andrej and Rohatgi, Dhruv}, booktitle = {Proceedings of Thirty Ninth Conference on Learning Theory}, pages = {5188--5209}, year = {2026}, editor = {Hanneke, Steve and Lattimore, Tor}, volume = {336}, series = {Proceedings of Machine Learning Research}, month = {29 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v336/main/assets/moitra26a/moitra26a.pdf}, url = {https://proceedings.mlr.press/v336/moitra26a.html}, abstract = { Inference-time algorithms are an emerging paradigm in which pre-trained models are used as subroutines to solve downstream tasks. Such algorithms have been proposed for tasks ranging from inverse problems and guided image generation to reasoning. However, the methods currently deployed in practice are heuristics with a variety of failure modes—and we have very little understanding of when these heuristics can be efficiently improved. In this paper, we consider the task of sampling from a reward-tilted diffusion model—that is, sampling from $p^{\star}(x) \propto p(x) \exp(r(x))$—given a reward function $r$ and pre-trained diffusion oracle for $p$. We provide a fine-grained analysis of the computational tractability of this task for quadratic rewards $r(x) = x^\top A x + b^\top x$. We show that linear-reward tilts are always efficiently sampleable—a simple result that seems to have gone unnoticed in the literature. We use this as a building block, along with a conceptually new ingredient—the Hubbard-Stratonovich transform—to provide an efficient algorithm for sampling from low-rank positive-definite quadratic tilts, i.e. $r(x) = x^\top A x$ where $A$ is positive-definite and of rank $O(1)$. For negative-definite tilts, i.e. $r(x) = - x^\top A x$ where $A$ is positive-definite, we prove that the problem is intractable even if $A$ is of rank 1 (albeit with exponentially-large entries). } }
Endnote
%0 Conference Paper %T Steering diffusion models with quadratic rewards: a fine-grained analysis %A Ankur Moitra %A Andrej Risteski %A Dhruv Rohatgi %B Proceedings of Thirty Ninth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2026 %E Steve Hanneke %E Tor Lattimore %F pmlr-v336-moitra26a %I PMLR %P 5188--5209 %U https://proceedings.mlr.press/v336/moitra26a.html %V 336 %X Inference-time algorithms are an emerging paradigm in which pre-trained models are used as subroutines to solve downstream tasks. Such algorithms have been proposed for tasks ranging from inverse problems and guided image generation to reasoning. However, the methods currently deployed in practice are heuristics with a variety of failure modes—and we have very little understanding of when these heuristics can be efficiently improved. In this paper, we consider the task of sampling from a reward-tilted diffusion model—that is, sampling from $p^{\star}(x) \propto p(x) \exp(r(x))$—given a reward function $r$ and pre-trained diffusion oracle for $p$. We provide a fine-grained analysis of the computational tractability of this task for quadratic rewards $r(x) = x^\top A x + b^\top x$. We show that linear-reward tilts are always efficiently sampleable—a simple result that seems to have gone unnoticed in the literature. We use this as a building block, along with a conceptually new ingredient—the Hubbard-Stratonovich transform—to provide an efficient algorithm for sampling from low-rank positive-definite quadratic tilts, i.e. $r(x) = x^\top A x$ where $A$ is positive-definite and of rank $O(1)$. For negative-definite tilts, i.e. $r(x) = - x^\top A x$ where $A$ is positive-definite, we prove that the problem is intractable even if $A$ is of rank 1 (albeit with exponentially-large entries).
APA
Moitra, A., Risteski, A. & Rohatgi, D.. (2026). Steering diffusion models with quadratic rewards: a fine-grained analysis. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:5188-5209 Available from https://proceedings.mlr.press/v336/moitra26a.html.

Related Material