Semantics-aware Test-time Adaptation for 3D Human Pose Estimation

Qiuxia Lin, Rongyu Chen, Kerui Gu, Angela Yao
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:37780-37796, 2025.

Abstract

This work highlights a semantics misalignment in 3D human pose estimation. For the task of test-time adaptation, the misalignment manifests as overly smoothed and unguided predictions. The smoothing settles predictions towards some average pose. Furthermore, when there are occlusions or truncations, the adaptation becomes fully unguided. To this end, we pioneer the integration of a semantics-aware motion prior for the test-time adaptation of 3D pose estimation. We leverage video understanding and a well-structured motion-text space to adapt the model motion prediction to adhere to video semantics during test time. Additionally, we incorporate a missing 2D pose completion based on the motion-text similarity. The pose completion strengthens the motion prior’s guidance for occlusions and truncations. Our method significantly improves state-of-the-art 3D human pose estimation TTA techniques, with more than 12% decrease in PA-MPJPE on 3DPW and 3DHP.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-lin25d, title = {Semantics-aware Test-time Adaptation for 3{D} Human Pose Estimation}, author = {Lin, Qiuxia and Chen, Rongyu and Gu, Kerui and Yao, Angela}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {37780--37796}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/lin25d/lin25d.pdf}, url = {https://proceedings.mlr.press/v267/lin25d.html}, abstract = {This work highlights a semantics misalignment in 3D human pose estimation. For the task of test-time adaptation, the misalignment manifests as overly smoothed and unguided predictions. The smoothing settles predictions towards some average pose. Furthermore, when there are occlusions or truncations, the adaptation becomes fully unguided. To this end, we pioneer the integration of a semantics-aware motion prior for the test-time adaptation of 3D pose estimation. We leverage video understanding and a well-structured motion-text space to adapt the model motion prediction to adhere to video semantics during test time. Additionally, we incorporate a missing 2D pose completion based on the motion-text similarity. The pose completion strengthens the motion prior’s guidance for occlusions and truncations. Our method significantly improves state-of-the-art 3D human pose estimation TTA techniques, with more than 12% decrease in PA-MPJPE on 3DPW and 3DHP.} }
Endnote
%0 Conference Paper %T Semantics-aware Test-time Adaptation for 3D Human Pose Estimation %A Qiuxia Lin %A Rongyu Chen %A Kerui Gu %A Angela Yao %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-lin25d %I PMLR %P 37780--37796 %U https://proceedings.mlr.press/v267/lin25d.html %V 267 %X This work highlights a semantics misalignment in 3D human pose estimation. For the task of test-time adaptation, the misalignment manifests as overly smoothed and unguided predictions. The smoothing settles predictions towards some average pose. Furthermore, when there are occlusions or truncations, the adaptation becomes fully unguided. To this end, we pioneer the integration of a semantics-aware motion prior for the test-time adaptation of 3D pose estimation. We leverage video understanding and a well-structured motion-text space to adapt the model motion prediction to adhere to video semantics during test time. Additionally, we incorporate a missing 2D pose completion based on the motion-text similarity. The pose completion strengthens the motion prior’s guidance for occlusions and truncations. Our method significantly improves state-of-the-art 3D human pose estimation TTA techniques, with more than 12% decrease in PA-MPJPE on 3DPW and 3DHP.
APA
Lin, Q., Chen, R., Gu, K. & Yao, A.. (2025). Semantics-aware Test-time Adaptation for 3D Human Pose Estimation. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:37780-37796 Available from https://proceedings.mlr.press/v267/lin25d.html.

Related Material