TextCenGen: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation

Tianyi Liang, Jiangqi Liu, Yifei Huang, Shiqi Jiang, Jianshen Shi, Changbo Wang, Chenhui Li
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:37237-37256, 2025.

Abstract

Text-to-image (T2I) generation has made remarkable progress in producing high-quality images, but a fundamental challenge remains: creating backgrounds that naturally accommodate text placement without compromising image quality. This capability is non-trivial for real-world applications like graphic design, where clear visual hierarchy between content and text is essential. Prior work has primarily focused on arranging layouts within existing static images, leaving unexplored the potential of T2I models for generating text-friendly backgrounds. We present TextCenGen, a training-free approach that actively relocates objects before optimizing text regions, rather than directly reducing cross-attention which degrades image quality. Our method introduces: (1) a force-directed graph approach that detects conflicting objects and guides them relocation using cross-attention maps, and (2) a spatial attention constraint that ensures smooth background generation in text regions. Our method is plug-and-play, requiring no additional training while well balancing both semantic fidelity and visual quality. Evaluated on our proposed text-friendly T2I benchmark of 27,000 images across three seed datasets, TextCenGen outperforms existing methods by achieving 23% lower saliency overlap in text regions while maintaining 98% of the original semantic fidelity measured by CLIP score and our proposed Visual-Textual Concordance Metric (VTCM).

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-liang25i, title = {{T}ext{C}en{G}en: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation}, author = {Liang, Tianyi and Liu, Jiangqi and Huang, Yifei and Jiang, Shiqi and Shi, Jianshen and Wang, Changbo and Li, Chenhui}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {37237--37256}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/liang25i/liang25i.pdf}, url = {https://proceedings.mlr.press/v267/liang25i.html}, abstract = {Text-to-image (T2I) generation has made remarkable progress in producing high-quality images, but a fundamental challenge remains: creating backgrounds that naturally accommodate text placement without compromising image quality. This capability is non-trivial for real-world applications like graphic design, where clear visual hierarchy between content and text is essential. Prior work has primarily focused on arranging layouts within existing static images, leaving unexplored the potential of T2I models for generating text-friendly backgrounds. We present TextCenGen, a training-free approach that actively relocates objects before optimizing text regions, rather than directly reducing cross-attention which degrades image quality. Our method introduces: (1) a force-directed graph approach that detects conflicting objects and guides them relocation using cross-attention maps, and (2) a spatial attention constraint that ensures smooth background generation in text regions. Our method is plug-and-play, requiring no additional training while well balancing both semantic fidelity and visual quality. Evaluated on our proposed text-friendly T2I benchmark of 27,000 images across three seed datasets, TextCenGen outperforms existing methods by achieving 23% lower saliency overlap in text regions while maintaining 98% of the original semantic fidelity measured by CLIP score and our proposed Visual-Textual Concordance Metric (VTCM).} }
Endnote
%0 Conference Paper %T TextCenGen: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation %A Tianyi Liang %A Jiangqi Liu %A Yifei Huang %A Shiqi Jiang %A Jianshen Shi %A Changbo Wang %A Chenhui Li %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-liang25i %I PMLR %P 37237--37256 %U https://proceedings.mlr.press/v267/liang25i.html %V 267 %X Text-to-image (T2I) generation has made remarkable progress in producing high-quality images, but a fundamental challenge remains: creating backgrounds that naturally accommodate text placement without compromising image quality. This capability is non-trivial for real-world applications like graphic design, where clear visual hierarchy between content and text is essential. Prior work has primarily focused on arranging layouts within existing static images, leaving unexplored the potential of T2I models for generating text-friendly backgrounds. We present TextCenGen, a training-free approach that actively relocates objects before optimizing text regions, rather than directly reducing cross-attention which degrades image quality. Our method introduces: (1) a force-directed graph approach that detects conflicting objects and guides them relocation using cross-attention maps, and (2) a spatial attention constraint that ensures smooth background generation in text regions. Our method is plug-and-play, requiring no additional training while well balancing both semantic fidelity and visual quality. Evaluated on our proposed text-friendly T2I benchmark of 27,000 images across three seed datasets, TextCenGen outperforms existing methods by achieving 23% lower saliency overlap in text regions while maintaining 98% of the original semantic fidelity measured by CLIP score and our proposed Visual-Textual Concordance Metric (VTCM).
APA
Liang, T., Liu, J., Huang, Y., Jiang, S., Shi, J., Wang, C. & Li, C.. (2025). TextCenGen: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:37237-37256 Available from https://proceedings.mlr.press/v267/liang25i.html.

Related Material