MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance

Yuang Zhang, Jiaxi Gu, Li-Wen Wang, Han Wang, Junqi Cheng, Yuefeng Zhu, Fangyuan Zou
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:74896-74910, 2025.

Abstract

In recent years, while generative AI has advanced significantly in image generation, video generation continues to face challenges in controllability, length, and detail quality, which hinder its application. We present MimicMotion, a framework for generating high-quality human videos of arbitrary length using motion guidance. Our approach has several highlights. Firstly, we introduce confidence-aware pose guidance that ensures high frame quality and temporal smoothness. Secondly, we introduce regional loss amplification based on pose confidence, which reduces image distortion in key regions. Lastly, we propose a progressive latent fusion strategy to generate long and smooth videos. Experiments demonstrate the effectiveness of our approach in producing high-quality human motion videos. Videos and comparisons are available at https://tencent.github.io/MimicMotion.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-zhang25v, title = {{M}imic{M}otion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance}, author = {Zhang, Yuang and Gu, Jiaxi and Wang, Li-Wen and Wang, Han and Cheng, Junqi and Zhu, Yuefeng and Zou, Fangyuan}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {74896--74910}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/zhang25v/zhang25v.pdf}, url = {https://proceedings.mlr.press/v267/zhang25v.html}, abstract = {In recent years, while generative AI has advanced significantly in image generation, video generation continues to face challenges in controllability, length, and detail quality, which hinder its application. We present MimicMotion, a framework for generating high-quality human videos of arbitrary length using motion guidance. Our approach has several highlights. Firstly, we introduce confidence-aware pose guidance that ensures high frame quality and temporal smoothness. Secondly, we introduce regional loss amplification based on pose confidence, which reduces image distortion in key regions. Lastly, we propose a progressive latent fusion strategy to generate long and smooth videos. Experiments demonstrate the effectiveness of our approach in producing high-quality human motion videos. Videos and comparisons are available at https://tencent.github.io/MimicMotion.} }
Endnote
%0 Conference Paper %T MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance %A Yuang Zhang %A Jiaxi Gu %A Li-Wen Wang %A Han Wang %A Junqi Cheng %A Yuefeng Zhu %A Fangyuan Zou %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-zhang25v %I PMLR %P 74896--74910 %U https://proceedings.mlr.press/v267/zhang25v.html %V 267 %X In recent years, while generative AI has advanced significantly in image generation, video generation continues to face challenges in controllability, length, and detail quality, which hinder its application. We present MimicMotion, a framework for generating high-quality human videos of arbitrary length using motion guidance. Our approach has several highlights. Firstly, we introduce confidence-aware pose guidance that ensures high frame quality and temporal smoothness. Secondly, we introduce regional loss amplification based on pose confidence, which reduces image distortion in key regions. Lastly, we propose a progressive latent fusion strategy to generate long and smooth videos. Experiments demonstrate the effectiveness of our approach in producing high-quality human motion videos. Videos and comparisons are available at https://tencent.github.io/MimicMotion.
APA
Zhang, Y., Gu, J., Wang, L., Wang, H., Cheng, J., Zhu, Y. & Zou, F.. (2025). MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:74896-74910 Available from https://proceedings.mlr.press/v267/zhang25v.html.

Related Material