FasterVoxelPose+: Fast and Accurate Voxel-based 3D Human Pose Estimation by Depth-wise Projection Decay

Zonghuang Zhuang, Yue Zhou
Proceedings of the 15th Asian Conference on Machine Learning, PMLR 222:1763-1778, 2024.

Abstract

In terms of multi-person multi-view 3D pose estimation, voxel-based methods gain promising accuracy by directly manipulating features in 3D space. Since their high computational cost prevents them from practical applications, Faster VoxelPose was proposed to address this complication by re-projecting the 3D feature volume onto coordinate planes, which greatly improved the efficiency of the model. However, it suffers from an obvious performance drop, especially when there are fewer cameras. In this paper, we propose a more accurate real-time 3D pose estimation method, FasterVoxelPose+, to address the above problem. We have made two improvements to the previous methods. First, we propose a novel method for constructing voxel feature volume called Depth-wise Projection Decay (DPD). It introduces extra depth information to the projection to alleviate depth ambiguity. Second, we design an Encoder-Decoder Network for processing the re-projected voxel features to further push up the performance of the model. Our method obtains 17.42mm MPJPE on Panoptic with real-time speed and can be easily used in other voxel-based models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v222-zhuang24a, title = {{FasterVoxelPose+}: {F}ast and Accurate Voxel-based {3D} Human Pose Estimation by Depth-wise Projection Decay}, author = {Zhuang, Zonghuang and Zhou, Yue}, booktitle = {Proceedings of the 15th Asian Conference on Machine Learning}, pages = {1763--1778}, year = {2024}, editor = {Yanıkoğlu, Berrin and Buntine, Wray}, volume = {222}, series = {Proceedings of Machine Learning Research}, month = {11--14 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v222/zhuang24a/zhuang24a.pdf}, url = {https://proceedings.mlr.press/v222/zhuang24a.html}, abstract = {In terms of multi-person multi-view 3D pose estimation, voxel-based methods gain promising accuracy by directly manipulating features in 3D space. Since their high computational cost prevents them from practical applications, Faster VoxelPose was proposed to address this complication by re-projecting the 3D feature volume onto coordinate planes, which greatly improved the efficiency of the model. However, it suffers from an obvious performance drop, especially when there are fewer cameras. In this paper, we propose a more accurate real-time 3D pose estimation method, FasterVoxelPose+, to address the above problem. We have made two improvements to the previous methods. First, we propose a novel method for constructing voxel feature volume called Depth-wise Projection Decay (DPD). It introduces extra depth information to the projection to alleviate depth ambiguity. Second, we design an Encoder-Decoder Network for processing the re-projected voxel features to further push up the performance of the model. Our method obtains 17.42mm MPJPE on Panoptic with real-time speed and can be easily used in other voxel-based models.} }
Endnote
%0 Conference Paper %T FasterVoxelPose+: Fast and Accurate Voxel-based 3D Human Pose Estimation by Depth-wise Projection Decay %A Zonghuang Zhuang %A Yue Zhou %B Proceedings of the 15th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Berrin Yanıkoğlu %E Wray Buntine %F pmlr-v222-zhuang24a %I PMLR %P 1763--1778 %U https://proceedings.mlr.press/v222/zhuang24a.html %V 222 %X In terms of multi-person multi-view 3D pose estimation, voxel-based methods gain promising accuracy by directly manipulating features in 3D space. Since their high computational cost prevents them from practical applications, Faster VoxelPose was proposed to address this complication by re-projecting the 3D feature volume onto coordinate planes, which greatly improved the efficiency of the model. However, it suffers from an obvious performance drop, especially when there are fewer cameras. In this paper, we propose a more accurate real-time 3D pose estimation method, FasterVoxelPose+, to address the above problem. We have made two improvements to the previous methods. First, we propose a novel method for constructing voxel feature volume called Depth-wise Projection Decay (DPD). It introduces extra depth information to the projection to alleviate depth ambiguity. Second, we design an Encoder-Decoder Network for processing the re-projected voxel features to further push up the performance of the model. Our method obtains 17.42mm MPJPE on Panoptic with real-time speed and can be easily used in other voxel-based models.
APA
Zhuang, Z. & Zhou, Y.. (2024). FasterVoxelPose+: Fast and Accurate Voxel-based 3D Human Pose Estimation by Depth-wise Projection Decay. Proceedings of the 15th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 222:1763-1778 Available from https://proceedings.mlr.press/v222/zhuang24a.html.

Related Material