[edit]
Multi-Transmotion: Pre-trained Model for Human Motion Prediction
Proceedings of The 8th Conference on Robot Learning, PMLR 270:2811-2827, 2025.
Abstract
The ability of intelligent systems to predict human behaviors is essential, particularly in fields such as autonomous vehicle navigation and social robotics. However, the intricacies of human motion have precluded the development of a standardized dataset and model for human motion prediction, thereby hindering the establishment of pre-trained models. In this paper, we address these limitations by integrating multiple datasets, encompassing both trajectory and 3D pose keypoints, to further propose a pre-trained model for human motion prediction. We merge seven distinct datasets across varying modalities and standardize their formats. To facilitate multimodal pre-training, we introduce Multi-Transmotion, an innovative transformer-based model capable of cross-modality pre-training. Additionally, we devise a novel masking strategy to learn rich representations. Our methodology demonstrates competitive performance across various datasets on several downstream tasks, including trajectory prediction in the NBA and JTA datasets, as well as pose prediction in the AMASS and 3DPW datasets. The code will be made available upon publication.