Composer: Creative and Controllable Image Synthesis with Composable Conditions

Lianghua Huang, Di Chen, Yu Liu, Yujun Shen, Deli Zhao, Jingren Zhou
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:13753-13773, 2023.

Abstract

Recent large-scale generative models learned on big data are capable of synthesizing incredible images yet suffer from limited controllability. This work offers a new generation paradigm that allows flexible control of the output image, such as spatial layout and palette, while maintaining the synthesis quality and model creativity. With compositionality as the core idea, we first decompose an image into representative factors, and then train a diffusion model with all these factors as the conditions to recompose the input. At the inference stage, the rich intermediate representations work as composable elements, leading to a huge design space (i.e., exponentially proportional to the number of decomposed factors) for customizable content creation. It is noteworthy that our approach, which we call Composer, supports various levels of conditions, such as text description as the global information, depth map and sketch as the local guidance, color histogram for low-level details, etc. Besides improving controllability, we confirm that Composer serves as a general framework and facilitates a wide range of classical generative tasks without retraining. Code and models will be made available.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-huang23b, title = {Composer: Creative and Controllable Image Synthesis with Composable Conditions}, author = {Huang, Lianghua and Chen, Di and Liu, Yu and Shen, Yujun and Zhao, Deli and Zhou, Jingren}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {13753--13773}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/huang23b/huang23b.pdf}, url = {https://proceedings.mlr.press/v202/huang23b.html}, abstract = {Recent large-scale generative models learned on big data are capable of synthesizing incredible images yet suffer from limited controllability. This work offers a new generation paradigm that allows flexible control of the output image, such as spatial layout and palette, while maintaining the synthesis quality and model creativity. With compositionality as the core idea, we first decompose an image into representative factors, and then train a diffusion model with all these factors as the conditions to recompose the input. At the inference stage, the rich intermediate representations work as composable elements, leading to a huge design space (i.e., exponentially proportional to the number of decomposed factors) for customizable content creation. It is noteworthy that our approach, which we call Composer, supports various levels of conditions, such as text description as the global information, depth map and sketch as the local guidance, color histogram for low-level details, etc. Besides improving controllability, we confirm that Composer serves as a general framework and facilitates a wide range of classical generative tasks without retraining. Code and models will be made available.} }
Endnote
%0 Conference Paper %T Composer: Creative and Controllable Image Synthesis with Composable Conditions %A Lianghua Huang %A Di Chen %A Yu Liu %A Yujun Shen %A Deli Zhao %A Jingren Zhou %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-huang23b %I PMLR %P 13753--13773 %U https://proceedings.mlr.press/v202/huang23b.html %V 202 %X Recent large-scale generative models learned on big data are capable of synthesizing incredible images yet suffer from limited controllability. This work offers a new generation paradigm that allows flexible control of the output image, such as spatial layout and palette, while maintaining the synthesis quality and model creativity. With compositionality as the core idea, we first decompose an image into representative factors, and then train a diffusion model with all these factors as the conditions to recompose the input. At the inference stage, the rich intermediate representations work as composable elements, leading to a huge design space (i.e., exponentially proportional to the number of decomposed factors) for customizable content creation. It is noteworthy that our approach, which we call Composer, supports various levels of conditions, such as text description as the global information, depth map and sketch as the local guidance, color histogram for low-level details, etc. Besides improving controllability, we confirm that Composer serves as a general framework and facilitates a wide range of classical generative tasks without retraining. Code and models will be made available.
APA
Huang, L., Chen, D., Liu, Y., Shen, Y., Zhao, D. & Zhou, J.. (2023). Composer: Creative and Controllable Image Synthesis with Composable Conditions. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:13753-13773 Available from https://proceedings.mlr.press/v202/huang23b.html.

Related Material