Decomposition of Graphic Design with Unified Multimodal Model

Hui Nie; Zhao Zhang; Yutao Cheng; Maoke Yang; Gonglei Shi; Qingsong Xie; Jie Shao; Xinglong Wu

Decomposition of Graphic Design with Unified Multimodal Model

Hui Nie, Zhao Zhang, Yutao Cheng, Maoke Yang, Gonglei Shi, Qingsong Xie, Jie Shao, Xinglong Wu

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:46377-46388, 2025.

Abstract

We propose Layer Decomposition of Graphic Designs (LDGD), a novel vision task that converts composite graphic design (e.g., posters) into structured representations comprising ordered RGB-A layers and metadata. By transforming visual content into structured data, LDGD facilitates precise image editing and offers significant advantages for digital content creation, management, and reuse. This task presents two core challenges: (1) predicting the attribute information (metadata) of each layer, and (2) recovering the occluded regions within overlapping layers to enable high-fidelity image reconstruction. To address this, we present the Decompose Layer Model (DeaM), a large unified multimodal model that integrates a conjoined visual encoder, a language model, and a condition-aware RGB-A decoder. DeaM adopts a two-stage processing pipeline: first generates layer-specific metadata containing information such as spatial coordinates and quantized encodings, and then reconstructs pixel-accurate layer images using a condition-aware RGB-A decoder. Beyond full decomposition, the model supports interactive decomposition via textual or point-based prompts. Extensive experiments demonstrate the effectiveness of the proposed method. The code is accessed at https://github.com/witnessai/DeaM.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-nie25c,
  title = 	 {Decomposition of Graphic Design with Unified Multimodal Model},
  author =       {Nie, Hui and Zhang, Zhao and Cheng, Yutao and Yang, Maoke and Shi, Gonglei and Xie, Qingsong and Shao, Jie and Wu, Xinglong},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {46377--46388},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/nie25c/nie25c.pdf},
  url = 	 {https://proceedings.mlr.press/v267/nie25c.html},
  abstract = 	 {We propose Layer Decomposition of Graphic Designs (LDGD), a novel vision task that converts composite graphic design (e.g., posters) into structured representations comprising ordered RGB-A layers and metadata. By transforming visual content into structured data, LDGD facilitates precise image editing and offers significant advantages for digital content creation, management, and reuse. This task presents two core challenges: (1) predicting the attribute information (metadata) of each layer, and (2) recovering the occluded regions within overlapping layers to enable high-fidelity image reconstruction. To address this, we present the Decompose Layer Model (DeaM), a large unified multimodal model that integrates a conjoined visual encoder, a language model, and a condition-aware RGB-A decoder. DeaM adopts a two-stage processing pipeline: first generates layer-specific metadata containing information such as spatial coordinates and quantized encodings, and then reconstructs pixel-accurate layer images using a condition-aware RGB-A decoder. Beyond full decomposition, the model supports interactive decomposition via textual or point-based prompts. Extensive experiments demonstrate the effectiveness of the proposed method. The code is accessed at https://github.com/witnessai/DeaM.}
}

Endnote

%0 Conference Paper
%T Decomposition of Graphic Design with Unified Multimodal Model
%A Hui Nie
%A Zhao Zhang
%A Yutao Cheng
%A Maoke Yang
%A Gonglei Shi
%A Qingsong Xie
%A Jie Shao
%A Xinglong Wu
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-nie25c
%I PMLR
%P 46377--46388
%U https://proceedings.mlr.press/v267/nie25c.html
%V 267
%X We propose Layer Decomposition of Graphic Designs (LDGD), a novel vision task that converts composite graphic design (e.g., posters) into structured representations comprising ordered RGB-A layers and metadata. By transforming visual content into structured data, LDGD facilitates precise image editing and offers significant advantages for digital content creation, management, and reuse. This task presents two core challenges: (1) predicting the attribute information (metadata) of each layer, and (2) recovering the occluded regions within overlapping layers to enable high-fidelity image reconstruction. To address this, we present the Decompose Layer Model (DeaM), a large unified multimodal model that integrates a conjoined visual encoder, a language model, and a condition-aware RGB-A decoder. DeaM adopts a two-stage processing pipeline: first generates layer-specific metadata containing information such as spatial coordinates and quantized encodings, and then reconstructs pixel-accurate layer images using a condition-aware RGB-A decoder. Beyond full decomposition, the model supports interactive decomposition via textual or point-based prompts. Extensive experiments demonstrate the effectiveness of the proposed method. The code is accessed at https://github.com/witnessai/DeaM.

APA

Nie, H., Zhang, Z., Cheng, Y., Yang, M., Shi, G., Xie, Q., Shao, J. & Wu, X.. (2025). Decomposition of Graphic Design with Unified Multimodal Model. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:46377-46388 Available from https://proceedings.mlr.press/v267/nie25c.html.

Decomposition of Graphic Design with Unified Multimodal Model

Abstract

Cite this Paper

Related Material