Hand-Eye Autonomous Delivery: Learning Humanoid Navigation, Locomotion and Reaching

Sirui Chen, Yufei Ye, Zi-ang Cao, Pei Xu, Jennifer Lew, Karen Liu
Proceedings of The 9th Conference on Robot Learning, PMLR 305:4058-4073, 2025.

Abstract

We propose Hand-Eye Autonomous Delivery (HEAD), a framework that learns navigation, locomotion, and reaching skills for humanoids, directly from human motion and vision perception data. We take a modular approach where the high-level planner commands the target position and orientation of the hands and eyes of the humanoid, delivered by the low-level policy that controls the whole-body movements. Specifically, the low-level whole-body controller learns to track the three points (eyes, left hand, and right hand) from existing large-scale human motion capture data while high-level policy learns from human data collected by Aria glasses. Our modular approach decouples the ego-centric vision perception from physical actions, promoting efficient learning and scalability to novel scenes. We evaluate our method both in simulation and in the real-world, demonstrating humanoid’s capabilities to navigate and reach in complex environments designed for humans.

Cite this Paper


BibTeX
@InProceedings{pmlr-v305-chen25e, title = {Hand-Eye Autonomous Delivery: Learning Humanoid Navigation, Locomotion and Reaching}, author = {Chen, Sirui and Ye, Yufei and Cao, Zi-ang and Xu, Pei and Lew, Jennifer and Liu, Karen}, booktitle = {Proceedings of The 9th Conference on Robot Learning}, pages = {4058--4073}, year = {2025}, editor = {Lim, Joseph and Song, Shuran and Park, Hae-Won}, volume = {305}, series = {Proceedings of Machine Learning Research}, month = {27--30 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v305/main/assets/chen25e/chen25e.pdf}, url = {https://proceedings.mlr.press/v305/chen25e.html}, abstract = {We propose Hand-Eye Autonomous Delivery (HEAD), a framework that learns navigation, locomotion, and reaching skills for humanoids, directly from human motion and vision perception data. We take a modular approach where the high-level planner commands the target position and orientation of the hands and eyes of the humanoid, delivered by the low-level policy that controls the whole-body movements. Specifically, the low-level whole-body controller learns to track the three points (eyes, left hand, and right hand) from existing large-scale human motion capture data while high-level policy learns from human data collected by Aria glasses. Our modular approach decouples the ego-centric vision perception from physical actions, promoting efficient learning and scalability to novel scenes. We evaluate our method both in simulation and in the real-world, demonstrating humanoid’s capabilities to navigate and reach in complex environments designed for humans.} }
Endnote
%0 Conference Paper %T Hand-Eye Autonomous Delivery: Learning Humanoid Navigation, Locomotion and Reaching %A Sirui Chen %A Yufei Ye %A Zi-ang Cao %A Pei Xu %A Jennifer Lew %A Karen Liu %B Proceedings of The 9th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Joseph Lim %E Shuran Song %E Hae-Won Park %F pmlr-v305-chen25e %I PMLR %P 4058--4073 %U https://proceedings.mlr.press/v305/chen25e.html %V 305 %X We propose Hand-Eye Autonomous Delivery (HEAD), a framework that learns navigation, locomotion, and reaching skills for humanoids, directly from human motion and vision perception data. We take a modular approach where the high-level planner commands the target position and orientation of the hands and eyes of the humanoid, delivered by the low-level policy that controls the whole-body movements. Specifically, the low-level whole-body controller learns to track the three points (eyes, left hand, and right hand) from existing large-scale human motion capture data while high-level policy learns from human data collected by Aria glasses. Our modular approach decouples the ego-centric vision perception from physical actions, promoting efficient learning and scalability to novel scenes. We evaluate our method both in simulation and in the real-world, demonstrating humanoid’s capabilities to navigate and reach in complex environments designed for humans.
APA
Chen, S., Ye, Y., Cao, Z., Xu, P., Lew, J. & Liu, K.. (2025). Hand-Eye Autonomous Delivery: Learning Humanoid Navigation, Locomotion and Reaching. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:4058-4073 Available from https://proceedings.mlr.press/v305/chen25e.html.

Related Material