JointMotion: Joint Self-Supervision for Joint Motion Prediction

Royden Wagner, Omer Sahin Tas, Marvin Klemp, Carlos Fernandez
Proceedings of The 8th Conference on Robot Learning, PMLR 270:3395-3406, 2025.

Abstract

We present JointMotion, a self-supervised pre-training method for joint motion prediction in self-driving vehicles. Our method jointly optimizes a scene-level objective connecting motion and environments, and an instance-level objective to refine learned representations. Scene-level representations are learned via non-contrastive similarity learning of past motion sequences and environment context. At the instance level, we use masked autoencoding to refine multimodal polyline representations. We complement this with an adaptive pre-training decoder that enables JointMotion to generalize across different environment representations, fusion mechanisms, and dataset characteristics. Notably, our method reduces the joint final displacement error of Wayformer, HPTR, and Scene Transformer models by 3%, 8%, and 12%, respectively; and enables transfer learning between the Waymo Open Motion and the Argoverse 2 Motion Forecasting datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v270-wagner25a, title = {JointMotion: Joint Self-Supervision for Joint Motion Prediction}, author = {Wagner, Royden and Tas, Omer Sahin and Klemp, Marvin and Fernandez, Carlos}, booktitle = {Proceedings of The 8th Conference on Robot Learning}, pages = {3395--3406}, year = {2025}, editor = {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram}, volume = {270}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v270/main/assets/wagner25a/wagner25a.pdf}, url = {https://proceedings.mlr.press/v270/wagner25a.html}, abstract = {We present JointMotion, a self-supervised pre-training method for joint motion prediction in self-driving vehicles. Our method jointly optimizes a scene-level objective connecting motion and environments, and an instance-level objective to refine learned representations. Scene-level representations are learned via non-contrastive similarity learning of past motion sequences and environment context. At the instance level, we use masked autoencoding to refine multimodal polyline representations. We complement this with an adaptive pre-training decoder that enables JointMotion to generalize across different environment representations, fusion mechanisms, and dataset characteristics. Notably, our method reduces the joint final displacement error of Wayformer, HPTR, and Scene Transformer models by 3%, 8%, and 12%, respectively; and enables transfer learning between the Waymo Open Motion and the Argoverse 2 Motion Forecasting datasets.} }
Endnote
%0 Conference Paper %T JointMotion: Joint Self-Supervision for Joint Motion Prediction %A Royden Wagner %A Omer Sahin Tas %A Marvin Klemp %A Carlos Fernandez %B Proceedings of The 8th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Pulkit Agrawal %E Oliver Kroemer %E Wolfram Burgard %F pmlr-v270-wagner25a %I PMLR %P 3395--3406 %U https://proceedings.mlr.press/v270/wagner25a.html %V 270 %X We present JointMotion, a self-supervised pre-training method for joint motion prediction in self-driving vehicles. Our method jointly optimizes a scene-level objective connecting motion and environments, and an instance-level objective to refine learned representations. Scene-level representations are learned via non-contrastive similarity learning of past motion sequences and environment context. At the instance level, we use masked autoencoding to refine multimodal polyline representations. We complement this with an adaptive pre-training decoder that enables JointMotion to generalize across different environment representations, fusion mechanisms, and dataset characteristics. Notably, our method reduces the joint final displacement error of Wayformer, HPTR, and Scene Transformer models by 3%, 8%, and 12%, respectively; and enables transfer learning between the Waymo Open Motion and the Argoverse 2 Motion Forecasting datasets.
APA
Wagner, R., Tas, O.S., Klemp, M. & Fernandez, C.. (2025). JointMotion: Joint Self-Supervision for Joint Motion Prediction. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:3395-3406 Available from https://proceedings.mlr.press/v270/wagner25a.html.

Related Material