Pose Guided Human Image Synthesis with Partially Decoupled GAN

Jianhan Wu, Shijing Si, Jianzong Wang, Xiaoyang Qu, Xiao Jing
Proceedings of The 14th Asian Conference on Machine Learning, PMLR 189:1133-1148, 2023.

Abstract

Pose Guided Human Image Synthesis (PGHIS) is a challenging task of transforming a human image from the reference pose to a target pose while preserving its style. Most existing methods encode the texture of the whole reference human image into a latent space, and then utilize a decoder to synthesize the image texture of the target pose. However, it is difficult to recover the detailed texture of the whole human image. To alleviate this problem, we propose a method by decoupling the human body into several parts (\emph{e.g.}, hair, face, hands, feet, \emph{etc.}) and then using each of these parts to guide the synthesis of a realistic image of the person, which preserves the detailed information of the generated images. In addition, we design a multi-head attention-based module for PGHIS. Because most convolutional neural network-based methods have difficulty in modeling long-range dependency due to the convolutional operation, the long-range modeling capability of attention mechanism is more suitable than convolutional neural networks for pose transfer task, especially for sharp pose deformation. Extensive experiments on Market-1501 and DeepFashion datasets reveal that our method almost outperforms other existing state-of-the-art methods in terms of both qualitative and quantitative metrics.

Cite this Paper


BibTeX
@InProceedings{pmlr-v189-wu23a, title = {Pose Guided Human Image Synthesis with Partially Decoupled GAN}, author = {Wu, Jianhan and Si, Shijing and Wang, Jianzong and Qu, Xiaoyang and Jing, Xiao}, booktitle = {Proceedings of The 14th Asian Conference on Machine Learning}, pages = {1133--1148}, year = {2023}, editor = {Khan, Emtiyaz and Gonen, Mehmet}, volume = {189}, series = {Proceedings of Machine Learning Research}, month = {12--14 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v189/wu23a/wu23a.pdf}, url = {https://proceedings.mlr.press/v189/wu23a.html}, abstract = {Pose Guided Human Image Synthesis (PGHIS) is a challenging task of transforming a human image from the reference pose to a target pose while preserving its style. Most existing methods encode the texture of the whole reference human image into a latent space, and then utilize a decoder to synthesize the image texture of the target pose. However, it is difficult to recover the detailed texture of the whole human image. To alleviate this problem, we propose a method by decoupling the human body into several parts (\emph{e.g.}, hair, face, hands, feet, \emph{etc.}) and then using each of these parts to guide the synthesis of a realistic image of the person, which preserves the detailed information of the generated images. In addition, we design a multi-head attention-based module for PGHIS. Because most convolutional neural network-based methods have difficulty in modeling long-range dependency due to the convolutional operation, the long-range modeling capability of attention mechanism is more suitable than convolutional neural networks for pose transfer task, especially for sharp pose deformation. Extensive experiments on Market-1501 and DeepFashion datasets reveal that our method almost outperforms other existing state-of-the-art methods in terms of both qualitative and quantitative metrics.} }
Endnote
%0 Conference Paper %T Pose Guided Human Image Synthesis with Partially Decoupled GAN %A Jianhan Wu %A Shijing Si %A Jianzong Wang %A Xiaoyang Qu %A Xiao Jing %B Proceedings of The 14th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Emtiyaz Khan %E Mehmet Gonen %F pmlr-v189-wu23a %I PMLR %P 1133--1148 %U https://proceedings.mlr.press/v189/wu23a.html %V 189 %X Pose Guided Human Image Synthesis (PGHIS) is a challenging task of transforming a human image from the reference pose to a target pose while preserving its style. Most existing methods encode the texture of the whole reference human image into a latent space, and then utilize a decoder to synthesize the image texture of the target pose. However, it is difficult to recover the detailed texture of the whole human image. To alleviate this problem, we propose a method by decoupling the human body into several parts (\emph{e.g.}, hair, face, hands, feet, \emph{etc.}) and then using each of these parts to guide the synthesis of a realistic image of the person, which preserves the detailed information of the generated images. In addition, we design a multi-head attention-based module for PGHIS. Because most convolutional neural network-based methods have difficulty in modeling long-range dependency due to the convolutional operation, the long-range modeling capability of attention mechanism is more suitable than convolutional neural networks for pose transfer task, especially for sharp pose deformation. Extensive experiments on Market-1501 and DeepFashion datasets reveal that our method almost outperforms other existing state-of-the-art methods in terms of both qualitative and quantitative metrics.
APA
Wu, J., Si, S., Wang, J., Qu, X. & Jing, X.. (2023). Pose Guided Human Image Synthesis with Partially Decoupled GAN. Proceedings of The 14th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 189:1133-1148 Available from https://proceedings.mlr.press/v189/wu23a.html.

Related Material