[edit]
FasterVoxelPose+: Fast and Accurate Voxel-based 3D Human Pose Estimation by Depth-wise Projection Decay
Proceedings of the 15th Asian Conference on Machine Learning, PMLR 222:1763-1778, 2024.
Abstract
In terms of multi-person multi-view 3D pose estimation, voxel-based methods gain promising accuracy by directly manipulating features in 3D space. Since their high computational cost prevents them from practical applications, Faster VoxelPose was proposed to address this complication by re-projecting the 3D feature volume onto coordinate planes, which greatly improved the efficiency of the model. However, it suffers from an obvious performance drop, especially when there are fewer cameras. In this paper, we propose a more accurate real-time 3D pose estimation method, FasterVoxelPose+, to address the above problem. We have made two improvements to the previous methods. First, we propose a novel method for constructing voxel feature volume called Depth-wise Projection Decay (DPD). It introduces extra depth information to the projection to alleviate depth ambiguity. Second, we design an Encoder-Decoder Network for processing the re-projected voxel features to further push up the performance of the model. Our method obtains 17.42mm MPJPE on Panoptic with real-time speed and can be easily used in other voxel-based models.