CCM: Real-Time Controllable Visual Content Creation Using Text-to-Image Consistency Models

Jie Xiao, Kai Zhu, Han Zhang, Zhiheng Liu, Yujun Shen, Zhantao Yang, Ruili Feng, Yu Liu, Xueyang Fu, Zheng-Jun Zha
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:54382-54401, 2024.

Abstract

Consistency Models (CMs) have showed a promise in creating high-quality images with few steps. However, the way to add new conditional controls to the pre-trained CMs has not been explored. In this paper, we explore the pivotal subject of leveraging the generative capacity and efficiency of consistency models to facilitate controllable visual content creation via ControlNet. First, it is observed that ControlNet trained for diffusion models (DMs) can be directly applied to CMs for high-level semantic controls but sacrifice image low-level details and realism. To tackle with this issue, we develop a CMs-tailored training strategy for ControlNet using the consistency training. It is substantiated that ControlNet can be successfully established through the consistency training technique. Besides, a unified adapter can be trained utilizing the consistency training, which enhances the adaptation of DM’s ControlNet. We quantitatively and qualitatively evaluate all strategies across various conditional controls, including sketch, hed, canny, depth, human pose, low-resolution image and masked image, with the pre-trained text-to-image latent consistency models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-xiao24h, title = {{CCM}: Real-Time Controllable Visual Content Creation Using Text-to-Image Consistency Models}, author = {Xiao, Jie and Zhu, Kai and Zhang, Han and Liu, Zhiheng and Shen, Yujun and Yang, Zhantao and Feng, Ruili and Liu, Yu and Fu, Xueyang and Zha, Zheng-Jun}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {54382--54401}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/xiao24h/xiao24h.pdf}, url = {https://proceedings.mlr.press/v235/xiao24h.html}, abstract = {Consistency Models (CMs) have showed a promise in creating high-quality images with few steps. However, the way to add new conditional controls to the pre-trained CMs has not been explored. In this paper, we explore the pivotal subject of leveraging the generative capacity and efficiency of consistency models to facilitate controllable visual content creation via ControlNet. First, it is observed that ControlNet trained for diffusion models (DMs) can be directly applied to CMs for high-level semantic controls but sacrifice image low-level details and realism. To tackle with this issue, we develop a CMs-tailored training strategy for ControlNet using the consistency training. It is substantiated that ControlNet can be successfully established through the consistency training technique. Besides, a unified adapter can be trained utilizing the consistency training, which enhances the adaptation of DM’s ControlNet. We quantitatively and qualitatively evaluate all strategies across various conditional controls, including sketch, hed, canny, depth, human pose, low-resolution image and masked image, with the pre-trained text-to-image latent consistency models.} }
Endnote
%0 Conference Paper %T CCM: Real-Time Controllable Visual Content Creation Using Text-to-Image Consistency Models %A Jie Xiao %A Kai Zhu %A Han Zhang %A Zhiheng Liu %A Yujun Shen %A Zhantao Yang %A Ruili Feng %A Yu Liu %A Xueyang Fu %A Zheng-Jun Zha %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-xiao24h %I PMLR %P 54382--54401 %U https://proceedings.mlr.press/v235/xiao24h.html %V 235 %X Consistency Models (CMs) have showed a promise in creating high-quality images with few steps. However, the way to add new conditional controls to the pre-trained CMs has not been explored. In this paper, we explore the pivotal subject of leveraging the generative capacity and efficiency of consistency models to facilitate controllable visual content creation via ControlNet. First, it is observed that ControlNet trained for diffusion models (DMs) can be directly applied to CMs for high-level semantic controls but sacrifice image low-level details and realism. To tackle with this issue, we develop a CMs-tailored training strategy for ControlNet using the consistency training. It is substantiated that ControlNet can be successfully established through the consistency training technique. Besides, a unified adapter can be trained utilizing the consistency training, which enhances the adaptation of DM’s ControlNet. We quantitatively and qualitatively evaluate all strategies across various conditional controls, including sketch, hed, canny, depth, human pose, low-resolution image and masked image, with the pre-trained text-to-image latent consistency models.
APA
Xiao, J., Zhu, K., Zhang, H., Liu, Z., Shen, Y., Yang, Z., Feng, R., Liu, Y., Fu, X. & Zha, Z.. (2024). CCM: Real-Time Controllable Visual Content Creation Using Text-to-Image Consistency Models. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:54382-54401 Available from https://proceedings.mlr.press/v235/xiao24h.html.

Related Material