MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion

Di Chang, Yichun Shi, Quankai Gao, Hongyi Xu, Jessica Fu, Guoxian Song, Qing Yan, Yizhe Zhu, Xiao Yang, Mohammad Soleymani
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:6263-6285, 2024.

Abstract

In this work, we propose MagicPose, a diffusion-based model for 2D human pose and facial expression retargeting. Specifically, given a reference image, we aim to generate a person’s new images by controlling the poses and facial expressions while keeping the identity unchanged. To this end, we propose a two-stage training strategy to disentangle human motions and appearance (e.g., facial expressions, skin tone, and dressing), consisting of (1) the pre-training of an appearance-control block and (2) learning appearance-disentangled pose control. Our novel design enables robust appearance control over generated human images, including body, facial attributes, and even background. By leveraging the prior knowledge of image diffusion models, MagicPose generalizes well to unseen human identities and complex poses without the need for additional fine-tuning. Moreover, the proposed model is easy to use and can be considered as a plug-in module/extension to Stable Diffusion. The project website is here. The code is available here.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-chang24d, title = {{M}agic{P}ose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion}, author = {Chang, Di and Shi, Yichun and Gao, Quankai and Xu, Hongyi and Fu, Jessica and Song, Guoxian and Yan, Qing and Zhu, Yizhe and Yang, Xiao and Soleymani, Mohammad}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {6263--6285}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/chang24d/chang24d.pdf}, url = {https://proceedings.mlr.press/v235/chang24d.html}, abstract = {In this work, we propose MagicPose, a diffusion-based model for 2D human pose and facial expression retargeting. Specifically, given a reference image, we aim to generate a person’s new images by controlling the poses and facial expressions while keeping the identity unchanged. To this end, we propose a two-stage training strategy to disentangle human motions and appearance (e.g., facial expressions, skin tone, and dressing), consisting of (1) the pre-training of an appearance-control block and (2) learning appearance-disentangled pose control. Our novel design enables robust appearance control over generated human images, including body, facial attributes, and even background. By leveraging the prior knowledge of image diffusion models, MagicPose generalizes well to unseen human identities and complex poses without the need for additional fine-tuning. Moreover, the proposed model is easy to use and can be considered as a plug-in module/extension to Stable Diffusion. The project website is here. The code is available here.} }
Endnote
%0 Conference Paper %T MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion %A Di Chang %A Yichun Shi %A Quankai Gao %A Hongyi Xu %A Jessica Fu %A Guoxian Song %A Qing Yan %A Yizhe Zhu %A Xiao Yang %A Mohammad Soleymani %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-chang24d %I PMLR %P 6263--6285 %U https://proceedings.mlr.press/v235/chang24d.html %V 235 %X In this work, we propose MagicPose, a diffusion-based model for 2D human pose and facial expression retargeting. Specifically, given a reference image, we aim to generate a person’s new images by controlling the poses and facial expressions while keeping the identity unchanged. To this end, we propose a two-stage training strategy to disentangle human motions and appearance (e.g., facial expressions, skin tone, and dressing), consisting of (1) the pre-training of an appearance-control block and (2) learning appearance-disentangled pose control. Our novel design enables robust appearance control over generated human images, including body, facial attributes, and even background. By leveraging the prior knowledge of image diffusion models, MagicPose generalizes well to unseen human identities and complex poses without the need for additional fine-tuning. Moreover, the proposed model is easy to use and can be considered as a plug-in module/extension to Stable Diffusion. The project website is here. The code is available here.
APA
Chang, D., Shi, Y., Gao, Q., Xu, H., Fu, J., Song, G., Yan, Q., Zhu, Y., Yang, X. & Soleymani, M.. (2024). MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:6263-6285 Available from https://proceedings.mlr.press/v235/chang24d.html.

Related Material