MultiPersona-Align: Zero-Shot Multi-Subject Personalized Image Generation with Layout-Guidance via Dual Representation Alignment

Veddhanth Chakravarthy, Samir Kumar das mohapatra, Chandrakala S
Proceedings of UniReps: the Third Edition of the Workshop on Unifying Representations in Neural Models, PMLR 322:102-114, 2026.

Abstract

We propose MultiPersona-Align, a novel approach for personalized image generation that enhances multi-subject diffusion models through self-supervised feature alignment. While existing methods rely primarily on spatial masking for subject control, they often produce semantically inconsistent features that fail to preserve the subject-specific visual characteristics. Our method introduces Dual Alignment Framework: (1) Spatially-Aligned Subject-Specific Cross-Attention Mechanism that aligns subject-specific diffusion features with corresponding DINOv2 CLS tokens within spatial regions, and (2) Patch-Aligned Self-Attention that ensures global semantic consistency by aligning full-image diffusion features with DINOv2 patch representations. This approach leverages DINOv2’s robust semantic understanding without requiring additional training data or annotations. Experiments on multi-subject generation tasks demonstrate that our alignment losses significantly improve subject fidelity and semantic consistency while maintaining spatial control. The method integrates seamlessly into existing architectures, adding minimal computational overhead during training while providing substantial quality improvements in personalized image generation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v322-chakravarthy26a, title = {MultiPersona-Align: Zero-Shot Multi-Subject Personalized Image Generation with Layout-Guidance via Dual Representation Alignment}, author = {Chakravarthy, Veddhanth and mohapatra, Samir Kumar das and S, Chandrakala}, booktitle = {Proceedings of UniReps: the Third Edition of the Workshop on Unifying Representations in Neural Models}, pages = {102--114}, year = {2026}, editor = {Fumero, Marco and Domine, Clementine and L"ahner, Zorah and Cannistraci, Irene and Zhao, Bo and Williams, Alex}, volume = {322}, series = {Proceedings of Machine Learning Research}, month = {06 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v322/main/assets/chakravarthy26a/chakravarthy26a.pdf}, url = {https://proceedings.mlr.press/v322/chakravarthy26a.html}, abstract = {We propose MultiPersona-Align, a novel approach for personalized image generation that enhances multi-subject diffusion models through self-supervised feature alignment. While existing methods rely primarily on spatial masking for subject control, they often produce semantically inconsistent features that fail to preserve the subject-specific visual characteristics. Our method introduces Dual Alignment Framework: (1) Spatially-Aligned Subject-Specific Cross-Attention Mechanism that aligns subject-specific diffusion features with corresponding DINOv2 CLS tokens within spatial regions, and (2) Patch-Aligned Self-Attention that ensures global semantic consistency by aligning full-image diffusion features with DINOv2 patch representations. This approach leverages DINOv2’s robust semantic understanding without requiring additional training data or annotations. Experiments on multi-subject generation tasks demonstrate that our alignment losses significantly improve subject fidelity and semantic consistency while maintaining spatial control. The method integrates seamlessly into existing architectures, adding minimal computational overhead during training while providing substantial quality improvements in personalized image generation.} }
Endnote
%0 Conference Paper %T MultiPersona-Align: Zero-Shot Multi-Subject Personalized Image Generation with Layout-Guidance via Dual Representation Alignment %A Veddhanth Chakravarthy %A Samir Kumar das mohapatra %A Chandrakala S %B Proceedings of UniReps: the Third Edition of the Workshop on Unifying Representations in Neural Models %C Proceedings of Machine Learning Research %D 2026 %E Marco Fumero %E Clementine Domine %E Zorah L"ahner %E Irene Cannistraci %E Bo Zhao %E Alex Williams %F pmlr-v322-chakravarthy26a %I PMLR %P 102--114 %U https://proceedings.mlr.press/v322/chakravarthy26a.html %V 322 %X We propose MultiPersona-Align, a novel approach for personalized image generation that enhances multi-subject diffusion models through self-supervised feature alignment. While existing methods rely primarily on spatial masking for subject control, they often produce semantically inconsistent features that fail to preserve the subject-specific visual characteristics. Our method introduces Dual Alignment Framework: (1) Spatially-Aligned Subject-Specific Cross-Attention Mechanism that aligns subject-specific diffusion features with corresponding DINOv2 CLS tokens within spatial regions, and (2) Patch-Aligned Self-Attention that ensures global semantic consistency by aligning full-image diffusion features with DINOv2 patch representations. This approach leverages DINOv2’s robust semantic understanding without requiring additional training data or annotations. Experiments on multi-subject generation tasks demonstrate that our alignment losses significantly improve subject fidelity and semantic consistency while maintaining spatial control. The method integrates seamlessly into existing architectures, adding minimal computational overhead during training while providing substantial quality improvements in personalized image generation.
APA
Chakravarthy, V., mohapatra, S.K.d. & S, C.. (2026). MultiPersona-Align: Zero-Shot Multi-Subject Personalized Image Generation with Layout-Guidance via Dual Representation Alignment. Proceedings of UniReps: the Third Edition of the Workshop on Unifying Representations in Neural Models, in Proceedings of Machine Learning Research 322:102-114 Available from https://proceedings.mlr.press/v322/chakravarthy26a.html.

Related Material