Prompt-tuning Latent Diffusion Models for Inverse Problems

Hyungjin Chung, Jong Chul Ye, Peyman Milanfar, Mauricio Delbracio
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:8941-8967, 2024.

Abstract

We propose a new method for solving imaging inverse problems using text-to-image latent diffusion models as general priors. Existing methods using latent diffusion models for inverse problems typically rely on simple null text prompts, which can lead to suboptimal performance. To improve upon this, we introduce a method for prompt tuning, which jointly optimizes the text embedding on-the-fly while running the reverse diffusion. This allows us to generate images that are more faithful to the diffusion prior. Specifically, our approach involves a unified optimization framework that simultaneously considers the prompt, latent, and pixel values through alternating minimization. This significantly diminishes image artifacts - a major problem when using latent diffusion models instead of pixel-based diffusion ones. Our method, called P2L, outperforms both pixel- and latent-diffusion model-based inverse problem solvers on a variety of tasks, such as super-resolution, deblurring, and inpainting. Furthermore, P2L demonstrates remarkable scalability to higher resolutions without artifacts.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-chung24b, title = {Prompt-tuning Latent Diffusion Models for Inverse Problems}, author = {Chung, Hyungjin and Ye, Jong Chul and Milanfar, Peyman and Delbracio, Mauricio}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {8941--8967}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/chung24b/chung24b.pdf}, url = {https://proceedings.mlr.press/v235/chung24b.html}, abstract = {We propose a new method for solving imaging inverse problems using text-to-image latent diffusion models as general priors. Existing methods using latent diffusion models for inverse problems typically rely on simple null text prompts, which can lead to suboptimal performance. To improve upon this, we introduce a method for prompt tuning, which jointly optimizes the text embedding on-the-fly while running the reverse diffusion. This allows us to generate images that are more faithful to the diffusion prior. Specifically, our approach involves a unified optimization framework that simultaneously considers the prompt, latent, and pixel values through alternating minimization. This significantly diminishes image artifacts - a major problem when using latent diffusion models instead of pixel-based diffusion ones. Our method, called P2L, outperforms both pixel- and latent-diffusion model-based inverse problem solvers on a variety of tasks, such as super-resolution, deblurring, and inpainting. Furthermore, P2L demonstrates remarkable scalability to higher resolutions without artifacts.} }
Endnote
%0 Conference Paper %T Prompt-tuning Latent Diffusion Models for Inverse Problems %A Hyungjin Chung %A Jong Chul Ye %A Peyman Milanfar %A Mauricio Delbracio %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-chung24b %I PMLR %P 8941--8967 %U https://proceedings.mlr.press/v235/chung24b.html %V 235 %X We propose a new method for solving imaging inverse problems using text-to-image latent diffusion models as general priors. Existing methods using latent diffusion models for inverse problems typically rely on simple null text prompts, which can lead to suboptimal performance. To improve upon this, we introduce a method for prompt tuning, which jointly optimizes the text embedding on-the-fly while running the reverse diffusion. This allows us to generate images that are more faithful to the diffusion prior. Specifically, our approach involves a unified optimization framework that simultaneously considers the prompt, latent, and pixel values through alternating minimization. This significantly diminishes image artifacts - a major problem when using latent diffusion models instead of pixel-based diffusion ones. Our method, called P2L, outperforms both pixel- and latent-diffusion model-based inverse problem solvers on a variety of tasks, such as super-resolution, deblurring, and inpainting. Furthermore, P2L demonstrates remarkable scalability to higher resolutions without artifacts.
APA
Chung, H., Ye, J.C., Milanfar, P. & Delbracio, M.. (2024). Prompt-tuning Latent Diffusion Models for Inverse Problems. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:8941-8967 Available from https://proceedings.mlr.press/v235/chung24b.html.

Related Material