Hypernetworks for image recontextualization

Maciej Zieba; Jakub Balicki; Tomasz Drozdz; Konrad Karanowski; Pawel Lorek; Hong Lyu; Aleksander Piotr Skorupa; Tomasz Trzcinski; Oriol Caudevilla; Jakub M. Tomczak

Hypernetworks for image recontextualization

Maciej Zieba, Jakub Balicki, Tomasz Drozdz, Konrad Karanowski, Pawel Lorek, Hong Lyu, Aleksander Piotr Skorupa, Tomasz Trzcinski, Oriol Caudevilla, Jakub M. Tomczak

Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models, PMLR 285:128-139, 2024.

Abstract

Image recontextualization, the task of placing a subject from an image into a new context to serve a specific purpose, has become increasingly important in fields like art, media, marketing, and e-commerce. Recent advancements in deep generative modeling, such as text-to-image and image-to-image synthesis via diffusion models, have significantly improved recontextualization capabilities. However, current methods, like DreamBooth and LoRA, require time-consuming fine-tuning per individual image, resulting in inefficiencies and often suboptimal outputs. Other approaches to recontextualization, like MagicClothing, require reorganization of the architecture of the base model and a time-consuming training process in a particular domain. In this work, we propose HyperLoRA, a novel framework that leverages hypernetworks to predict LoRA parameters, allowing for more efficient image recontextualization without the need for image-specific fine-tuning. HyperLoRA utilizes domain pairs of context images and target objects, enabling instant adaptation to new contexts while significantly reducing computational costs. Our method outperforms traditional techniques by offering more accurate adjustments, broader applicability across multiple modalities (e.g., text, video, sound, and structured data), and scalable deployment. Experimental results demonstrate the effectiveness of our approach in garment-to-model recontextualization, highlighting the potential for broader applications.

Cite this Paper

BibTeX

@InProceedings{pmlr-v285-zieba24a,
  title = 	 {Hypernetworks for image recontextualization},
  author =       {Zieba, Maciej and Balicki, Jakub and Drozdz, Tomasz and Karanowski, Konrad and Lorek, Pawel and Lyu, Hong and Skorupa, Aleksander Piotr and Trzcinski, Tomasz and Caudevilla, Oriol and Tomczak, Jakub M.},
  booktitle = 	 {Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models},
  pages = 	 {128--139},
  year = 	 {2024},
  editor = 	 {Fumero, Marco and Domine, Clementine and Lähner, Zorah and Crisostomi, Donato and Moschella, Luca and Stachenfeld, Kimberly},
  volume = 	 {285},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {14 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v285/main/assets/zieba24a/zieba24a.pdf},
  url = 	 {https://proceedings.mlr.press/v285/zieba24a.html},
  abstract = 	 {Image recontextualization, the task of placing a subject from an image into a new context to serve a specific purpose, has become increasingly important in fields like art, media, marketing, and e-commerce. Recent advancements in deep generative modeling, such as text-to-image and image-to-image synthesis via diffusion models, have significantly improved recontextualization capabilities. However, current methods, like DreamBooth and LoRA, require time-consuming fine-tuning per individual image, resulting in inefficiencies and often suboptimal outputs. Other approaches to recontextualization, like MagicClothing, require reorganization of the architecture of the base model and a time-consuming training process in a particular domain. In this work, we propose HyperLoRA, a novel framework that leverages hypernetworks to predict LoRA parameters, allowing for more efficient image recontextualization without the need for image-specific fine-tuning. HyperLoRA utilizes domain pairs of context images and target objects, enabling instant adaptation to new contexts while significantly reducing computational costs. Our method outperforms traditional techniques by offering more accurate adjustments, broader applicability across multiple modalities (e.g., text, video, sound, and structured data), and scalable deployment. Experimental results demonstrate the effectiveness of our approach in garment-to-model recontextualization, highlighting the potential for broader applications.}
}

Endnote

%0 Conference Paper
%T Hypernetworks for image recontextualization
%A Maciej Zieba
%A Jakub Balicki
%A Tomasz Drozdz
%A Konrad Karanowski
%A Pawel Lorek
%A Hong Lyu
%A Aleksander Piotr Skorupa
%A Tomasz Trzcinski
%A Oriol Caudevilla
%A Jakub M. Tomczak
%B Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models
%C Proceedings of Machine Learning Research
%D 2024
%E Marco Fumero
%E Clementine Domine
%E Zorah Lähner
%E Donato Crisostomi
%E Luca Moschella
%E Kimberly Stachenfeld	
%F pmlr-v285-zieba24a
%I PMLR
%P 128--139
%U https://proceedings.mlr.press/v285/zieba24a.html
%V 285
%X Image recontextualization, the task of placing a subject from an image into a new context to serve a specific purpose, has become increasingly important in fields like art, media, marketing, and e-commerce. Recent advancements in deep generative modeling, such as text-to-image and image-to-image synthesis via diffusion models, have significantly improved recontextualization capabilities. However, current methods, like DreamBooth and LoRA, require time-consuming fine-tuning per individual image, resulting in inefficiencies and often suboptimal outputs. Other approaches to recontextualization, like MagicClothing, require reorganization of the architecture of the base model and a time-consuming training process in a particular domain. In this work, we propose HyperLoRA, a novel framework that leverages hypernetworks to predict LoRA parameters, allowing for more efficient image recontextualization without the need for image-specific fine-tuning. HyperLoRA utilizes domain pairs of context images and target objects, enabling instant adaptation to new contexts while significantly reducing computational costs. Our method outperforms traditional techniques by offering more accurate adjustments, broader applicability across multiple modalities (e.g., text, video, sound, and structured data), and scalable deployment. Experimental results demonstrate the effectiveness of our approach in garment-to-model recontextualization, highlighting the potential for broader applications.

APA

Zieba, M., Balicki, J., Drozdz, T., Karanowski, K., Lorek, P., Lyu, H., Skorupa, A.P., Trzcinski, T., Caudevilla, O. & Tomczak, J.M.. (2024). Hypernetworks for image recontextualization. Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models, in Proceedings of Machine Learning Research 285:128-139 Available from https://proceedings.mlr.press/v285/zieba24a.html.

Related Material

Download PDF