HUM3DIL: Semi-supervised Multi-modal 3D HumanPose Estimation for Autonomous Driving

Andrei Zanfir, Mihai Zanfir, Alex Gorban, Jingwei Ji, Yin Zhou, Dragomir Anguelov, Cristian Sminchisescu
Proceedings of The 6th Conference on Robot Learning, PMLR 205:1114-1124, 2023.

Abstract

Autonomous driving is an exciting new industry, posing important research questions. Within the perception module, 3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians. While hardware systems and sensors have dramatically improved over the decades – with cars potentially boasting complex LiDAR and vision systems and with a growing expansion of the available body of dedicated datasets for this newly available information – not much work has been done to harness these novel signals for the core problem of 3D human pose estimation. Our method, which we coin HUM3DIL (HUMan 3D from Images and LiDAR), efficiently uses of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin. It is a fast and compact model for onboard deployment. Specifically, we embed LiDAR points into pixel-aligned multi-modal features, which we pass through a sequence of Transformer refinement stages. Quantitative experiments on the Waymo Open Dataset support these claims, where we achieve state-of-the-art results on the task of 3D pose estimation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v205-zanfir23a, title = {HUM3DIL: Semi-supervised Multi-modal 3D HumanPose Estimation for Autonomous Driving}, author = {Zanfir, Andrei and Zanfir, Mihai and Gorban, Alex and Ji, Jingwei and Zhou, Yin and Anguelov, Dragomir and Sminchisescu, Cristian}, booktitle = {Proceedings of The 6th Conference on Robot Learning}, pages = {1114--1124}, year = {2023}, editor = {Liu, Karen and Kulic, Dana and Ichnowski, Jeff}, volume = {205}, series = {Proceedings of Machine Learning Research}, month = {14--18 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v205/zanfir23a/zanfir23a.pdf}, url = {https://proceedings.mlr.press/v205/zanfir23a.html}, abstract = {Autonomous driving is an exciting new industry, posing important research questions. Within the perception module, 3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians. While hardware systems and sensors have dramatically improved over the decades – with cars potentially boasting complex LiDAR and vision systems and with a growing expansion of the available body of dedicated datasets for this newly available information – not much work has been done to harness these novel signals for the core problem of 3D human pose estimation. Our method, which we coin HUM3DIL (HUMan 3D from Images and LiDAR), efficiently uses of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin. It is a fast and compact model for onboard deployment. Specifically, we embed LiDAR points into pixel-aligned multi-modal features, which we pass through a sequence of Transformer refinement stages. Quantitative experiments on the Waymo Open Dataset support these claims, where we achieve state-of-the-art results on the task of 3D pose estimation.} }
Endnote
%0 Conference Paper %T HUM3DIL: Semi-supervised Multi-modal 3D HumanPose Estimation for Autonomous Driving %A Andrei Zanfir %A Mihai Zanfir %A Alex Gorban %A Jingwei Ji %A Yin Zhou %A Dragomir Anguelov %A Cristian Sminchisescu %B Proceedings of The 6th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2023 %E Karen Liu %E Dana Kulic %E Jeff Ichnowski %F pmlr-v205-zanfir23a %I PMLR %P 1114--1124 %U https://proceedings.mlr.press/v205/zanfir23a.html %V 205 %X Autonomous driving is an exciting new industry, posing important research questions. Within the perception module, 3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians. While hardware systems and sensors have dramatically improved over the decades – with cars potentially boasting complex LiDAR and vision systems and with a growing expansion of the available body of dedicated datasets for this newly available information – not much work has been done to harness these novel signals for the core problem of 3D human pose estimation. Our method, which we coin HUM3DIL (HUMan 3D from Images and LiDAR), efficiently uses of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin. It is a fast and compact model for onboard deployment. Specifically, we embed LiDAR points into pixel-aligned multi-modal features, which we pass through a sequence of Transformer refinement stages. Quantitative experiments on the Waymo Open Dataset support these claims, where we achieve state-of-the-art results on the task of 3D pose estimation.
APA
Zanfir, A., Zanfir, M., Gorban, A., Ji, J., Zhou, Y., Anguelov, D. & Sminchisescu, C.. (2023). HUM3DIL: Semi-supervised Multi-modal 3D HumanPose Estimation for Autonomous Driving. Proceedings of The 6th Conference on Robot Learning, in Proceedings of Machine Learning Research 205:1114-1124 Available from https://proceedings.mlr.press/v205/zanfir23a.html.

Related Material