Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning

Nikita Rudin, David Hoeller, Philipp Reist, Marco Hutter
Proceedings of the 5th Conference on Robot Learning, PMLR 164:91-100, 2022.

Abstract

In this work, we present and study a training set-up that achieves fast policy generation for real-world robotic tasks by using massive parallelism on a single workstation GPU. We analyze and discuss the impact of different training algorithm components in the massively parallel regime on the final policy performance and training times. In addition, we present a novel game-inspired curriculum that is well suited for training with thousands of simulated robots in parallel. We evaluate the approach by training the quadrupedal robot ANYmal to walk on challenging terrain. The parallel approach allows training policies for flat terrain in under four minutes, and in twenty minutes for uneven terrain. This represents a speedup of multiple orders of magnitude compared to previous work. Finally, we transfer the policies to the real robot to validate the approach. We open-source our training code to help accelerate further research in the field of learned legged locomotion: https://leggedrobotics.github.io/legged_gym/.

Cite this Paper


BibTeX
@InProceedings{pmlr-v164-rudin22a, title = {Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning}, author = {Rudin, Nikita and Hoeller, David and Reist, Philipp and Hutter, Marco}, booktitle = {Proceedings of the 5th Conference on Robot Learning}, pages = {91--100}, year = {2022}, editor = {Faust, Aleksandra and Hsu, David and Neumann, Gerhard}, volume = {164}, series = {Proceedings of Machine Learning Research}, month = {08--11 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v164/rudin22a/rudin22a.pdf}, url = {https://proceedings.mlr.press/v164/rudin22a.html}, abstract = {In this work, we present and study a training set-up that achieves fast policy generation for real-world robotic tasks by using massive parallelism on a single workstation GPU. We analyze and discuss the impact of different training algorithm components in the massively parallel regime on the final policy performance and training times. In addition, we present a novel game-inspired curriculum that is well suited for training with thousands of simulated robots in parallel. We evaluate the approach by training the quadrupedal robot ANYmal to walk on challenging terrain. The parallel approach allows training policies for flat terrain in under four minutes, and in twenty minutes for uneven terrain. This represents a speedup of multiple orders of magnitude compared to previous work. Finally, we transfer the policies to the real robot to validate the approach. We open-source our training code to help accelerate further research in the field of learned legged locomotion: https://leggedrobotics.github.io/legged_gym/.} }
Endnote
%0 Conference Paper %T Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning %A Nikita Rudin %A David Hoeller %A Philipp Reist %A Marco Hutter %B Proceedings of the 5th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2022 %E Aleksandra Faust %E David Hsu %E Gerhard Neumann %F pmlr-v164-rudin22a %I PMLR %P 91--100 %U https://proceedings.mlr.press/v164/rudin22a.html %V 164 %X In this work, we present and study a training set-up that achieves fast policy generation for real-world robotic tasks by using massive parallelism on a single workstation GPU. We analyze and discuss the impact of different training algorithm components in the massively parallel regime on the final policy performance and training times. In addition, we present a novel game-inspired curriculum that is well suited for training with thousands of simulated robots in parallel. We evaluate the approach by training the quadrupedal robot ANYmal to walk on challenging terrain. The parallel approach allows training policies for flat terrain in under four minutes, and in twenty minutes for uneven terrain. This represents a speedup of multiple orders of magnitude compared to previous work. Finally, we transfer the policies to the real robot to validate the approach. We open-source our training code to help accelerate further research in the field of learned legged locomotion: https://leggedrobotics.github.io/legged_gym/.
APA
Rudin, N., Hoeller, D., Reist, P. & Hutter, M.. (2022). Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning. Proceedings of the 5th Conference on Robot Learning, in Proceedings of Machine Learning Research 164:91-100 Available from https://proceedings.mlr.press/v164/rudin22a.html.

Related Material