Multi-Transmotion: Pre-trained Model for Human Motion Prediction

Yang Gao, Po-Chien Luan, Alexandre Alahi
Proceedings of The 8th Conference on Robot Learning, PMLR 270:2811-2827, 2025.

Abstract

The ability of intelligent systems to predict human behaviors is essential, particularly in fields such as autonomous vehicle navigation and social robotics. However, the intricacies of human motion have precluded the development of a standardized dataset and model for human motion prediction, thereby hindering the establishment of pre-trained models. In this paper, we address these limitations by integrating multiple datasets, encompassing both trajectory and 3D pose keypoints, to further propose a pre-trained model for human motion prediction. We merge seven distinct datasets across varying modalities and standardize their formats. To facilitate multimodal pre-training, we introduce Multi-Transmotion, an innovative transformer-based model capable of cross-modality pre-training. Additionally, we devise a novel masking strategy to learn rich representations. Our methodology demonstrates competitive performance across various datasets on several downstream tasks, including trajectory prediction in the NBA and JTA datasets, as well as pose prediction in the AMASS and 3DPW datasets. The code will be made available upon publication.

Cite this Paper


BibTeX
@InProceedings{pmlr-v270-gao25b, title = {Multi-Transmotion: Pre-trained Model for Human Motion Prediction}, author = {Gao, Yang and Luan, Po-Chien and Alahi, Alexandre}, booktitle = {Proceedings of The 8th Conference on Robot Learning}, pages = {2811--2827}, year = {2025}, editor = {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram}, volume = {270}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v270/main/assets/gao25b/gao25b.pdf}, url = {https://proceedings.mlr.press/v270/gao25b.html}, abstract = {The ability of intelligent systems to predict human behaviors is essential, particularly in fields such as autonomous vehicle navigation and social robotics. However, the intricacies of human motion have precluded the development of a standardized dataset and model for human motion prediction, thereby hindering the establishment of pre-trained models. In this paper, we address these limitations by integrating multiple datasets, encompassing both trajectory and 3D pose keypoints, to further propose a pre-trained model for human motion prediction. We merge seven distinct datasets across varying modalities and standardize their formats. To facilitate multimodal pre-training, we introduce Multi-Transmotion, an innovative transformer-based model capable of cross-modality pre-training. Additionally, we devise a novel masking strategy to learn rich representations. Our methodology demonstrates competitive performance across various datasets on several downstream tasks, including trajectory prediction in the NBA and JTA datasets, as well as pose prediction in the AMASS and 3DPW datasets. The code will be made available upon publication.} }
Endnote
%0 Conference Paper %T Multi-Transmotion: Pre-trained Model for Human Motion Prediction %A Yang Gao %A Po-Chien Luan %A Alexandre Alahi %B Proceedings of The 8th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Pulkit Agrawal %E Oliver Kroemer %E Wolfram Burgard %F pmlr-v270-gao25b %I PMLR %P 2811--2827 %U https://proceedings.mlr.press/v270/gao25b.html %V 270 %X The ability of intelligent systems to predict human behaviors is essential, particularly in fields such as autonomous vehicle navigation and social robotics. However, the intricacies of human motion have precluded the development of a standardized dataset and model for human motion prediction, thereby hindering the establishment of pre-trained models. In this paper, we address these limitations by integrating multiple datasets, encompassing both trajectory and 3D pose keypoints, to further propose a pre-trained model for human motion prediction. We merge seven distinct datasets across varying modalities and standardize their formats. To facilitate multimodal pre-training, we introduce Multi-Transmotion, an innovative transformer-based model capable of cross-modality pre-training. Additionally, we devise a novel masking strategy to learn rich representations. Our methodology demonstrates competitive performance across various datasets on several downstream tasks, including trajectory prediction in the NBA and JTA datasets, as well as pose prediction in the AMASS and 3DPW datasets. The code will be made available upon publication.
APA
Gao, Y., Luan, P. & Alahi, A.. (2025). Multi-Transmotion: Pre-trained Model for Human Motion Prediction. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:2811-2827 Available from https://proceedings.mlr.press/v270/gao25b.html.

Related Material