[edit]
MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:74896-74910, 2025.
Abstract
In recent years, while generative AI has advanced significantly in image generation, video generation continues to face challenges in controllability, length, and detail quality, which hinder its application. We present MimicMotion, a framework for generating high-quality human videos of arbitrary length using motion guidance. Our approach has several highlights. Firstly, we introduce confidence-aware pose guidance that ensures high frame quality and temporal smoothness. Secondly, we introduce regional loss amplification based on pose confidence, which reduces image distortion in key regions. Lastly, we propose a progressive latent fusion strategy to generate long and smooth videos. Experiments demonstrate the effectiveness of our approach in producing high-quality human motion videos. Videos and comparisons are available at https://tencent.github.io/MimicMotion.