Robot Trains Robot: Automatic Real-World Policy Adaptation and Learning for Humanoids

Kaizhe Hu, Haochen Shi, Yao He, Weizhuo Wang, Karen Liu, Shuran Song
Proceedings of The 9th Conference on Robot Learning, PMLR 305:1672-1689, 2025.

Abstract

Simulation-based reinforcement learning (RL) has significantly advanced humanoid locomotion tasks, yet direct real-world RL from scratch or starting from pretrained policies remains rare, limiting the full potential of humanoid robots. Real-world training, despite being crucial for overcoming the sim-to-real gap, faces substantial challenges related to safety, reward design, and learning efficiency. To address these limitations, we propose Robot-Trains-Robot (RTR), a novel framework where a robotic arm teacher actively supports and guides a humanoid student robot. The RTR system provides protection, schedule, reward, perturbation, failure detection, and automatic resets, enabling efficient long-term real-world training with minimal human intervention. Furthermore, we propose a novel RL pipeline that facilitates and stabilizes sim-to-real transfer by optimizing a single dynamics-encoded latent variable in the real world. We validate our method through two challenging real-world humanoid tasks: fine-tuning a walking policy for precise speed tracking and learning a humanoid swing-up task from scratch, illustrating the promising capabilities of real-world humanoid learning realized by RTR-style systems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v305-hu25a, title = {Robot Trains Robot: Automatic Real-World Policy Adaptation and Learning for Humanoids}, author = {Hu, Kaizhe and Shi, Haochen and He, Yao and Wang, Weizhuo and Liu, Karen and Song, Shuran}, booktitle = {Proceedings of The 9th Conference on Robot Learning}, pages = {1672--1689}, year = {2025}, editor = {Lim, Joseph and Song, Shuran and Park, Hae-Won}, volume = {305}, series = {Proceedings of Machine Learning Research}, month = {27--30 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v305/main/assets/hu25a/hu25a.pdf}, url = {https://proceedings.mlr.press/v305/hu25a.html}, abstract = {Simulation-based reinforcement learning (RL) has significantly advanced humanoid locomotion tasks, yet direct real-world RL from scratch or starting from pretrained policies remains rare, limiting the full potential of humanoid robots. Real-world training, despite being crucial for overcoming the sim-to-real gap, faces substantial challenges related to safety, reward design, and learning efficiency. To address these limitations, we propose Robot-Trains-Robot (RTR), a novel framework where a robotic arm teacher actively supports and guides a humanoid student robot. The RTR system provides protection, schedule, reward, perturbation, failure detection, and automatic resets, enabling efficient long-term real-world training with minimal human intervention. Furthermore, we propose a novel RL pipeline that facilitates and stabilizes sim-to-real transfer by optimizing a single dynamics-encoded latent variable in the real world. We validate our method through two challenging real-world humanoid tasks: fine-tuning a walking policy for precise speed tracking and learning a humanoid swing-up task from scratch, illustrating the promising capabilities of real-world humanoid learning realized by RTR-style systems.} }
Endnote
%0 Conference Paper %T Robot Trains Robot: Automatic Real-World Policy Adaptation and Learning for Humanoids %A Kaizhe Hu %A Haochen Shi %A Yao He %A Weizhuo Wang %A Karen Liu %A Shuran Song %B Proceedings of The 9th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Joseph Lim %E Shuran Song %E Hae-Won Park %F pmlr-v305-hu25a %I PMLR %P 1672--1689 %U https://proceedings.mlr.press/v305/hu25a.html %V 305 %X Simulation-based reinforcement learning (RL) has significantly advanced humanoid locomotion tasks, yet direct real-world RL from scratch or starting from pretrained policies remains rare, limiting the full potential of humanoid robots. Real-world training, despite being crucial for overcoming the sim-to-real gap, faces substantial challenges related to safety, reward design, and learning efficiency. To address these limitations, we propose Robot-Trains-Robot (RTR), a novel framework where a robotic arm teacher actively supports and guides a humanoid student robot. The RTR system provides protection, schedule, reward, perturbation, failure detection, and automatic resets, enabling efficient long-term real-world training with minimal human intervention. Furthermore, we propose a novel RL pipeline that facilitates and stabilizes sim-to-real transfer by optimizing a single dynamics-encoded latent variable in the real world. We validate our method through two challenging real-world humanoid tasks: fine-tuning a walking policy for precise speed tracking and learning a humanoid swing-up task from scratch, illustrating the promising capabilities of real-world humanoid learning realized by RTR-style systems.
APA
Hu, K., Shi, H., He, Y., Wang, W., Liu, K. & Song, S.. (2025). Robot Trains Robot: Automatic Real-World Policy Adaptation and Learning for Humanoids. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:1672-1689 Available from https://proceedings.mlr.press/v305/hu25a.html.

Related Material