Smooth Exploration for Robotic Reinforcement Learning

Antonin Raffin, Jens Kober, Freek Stulp
Proceedings of the 5th Conference on Robot Learning, PMLR 164:1634-1644, 2022.

Abstract

Reinforcement learning (RL) enables robots to learn skills from interactions with the real world. In practice, the unstructured step-based exploration used in Deep RL – often very successful in simulation – leads to jerky motion patterns on real robots. Consequences of the resulting shaky behavior are poor exploration, or even damage to the robot. We address these issues by adapting state-dependent exploration (SDE) to current Deep RL algorithms. To enable this adaptation, we propose two extensions to the original SDE, using more general features and re-sampling the noise periodically, which leads to a new exploration method generalized state-dependent exploration (gSDE). We evaluate gSDE both in simulation, on PyBullet continuous control tasks, and directly on three different real robots: a tendon-driven elastic robot, a quadruped and an RC car. The noise sampling interval of gSDE enables a compromise between performance and smoothness, which allows training directly on the real robots without loss of performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v164-raffin22a, title = {Smooth Exploration for Robotic Reinforcement Learning}, author = {Raffin, Antonin and Kober, Jens and Stulp, Freek}, booktitle = {Proceedings of the 5th Conference on Robot Learning}, pages = {1634--1644}, year = {2022}, editor = {Faust, Aleksandra and Hsu, David and Neumann, Gerhard}, volume = {164}, series = {Proceedings of Machine Learning Research}, month = {08--11 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v164/raffin22a/raffin22a.pdf}, url = {https://proceedings.mlr.press/v164/raffin22a.html}, abstract = {Reinforcement learning (RL) enables robots to learn skills from interactions with the real world. In practice, the unstructured step-based exploration used in Deep RL – often very successful in simulation – leads to jerky motion patterns on real robots. Consequences of the resulting shaky behavior are poor exploration, or even damage to the robot. We address these issues by adapting state-dependent exploration (SDE) to current Deep RL algorithms. To enable this adaptation, we propose two extensions to the original SDE, using more general features and re-sampling the noise periodically, which leads to a new exploration method generalized state-dependent exploration (gSDE). We evaluate gSDE both in simulation, on PyBullet continuous control tasks, and directly on three different real robots: a tendon-driven elastic robot, a quadruped and an RC car. The noise sampling interval of gSDE enables a compromise between performance and smoothness, which allows training directly on the real robots without loss of performance.} }
Endnote
%0 Conference Paper %T Smooth Exploration for Robotic Reinforcement Learning %A Antonin Raffin %A Jens Kober %A Freek Stulp %B Proceedings of the 5th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2022 %E Aleksandra Faust %E David Hsu %E Gerhard Neumann %F pmlr-v164-raffin22a %I PMLR %P 1634--1644 %U https://proceedings.mlr.press/v164/raffin22a.html %V 164 %X Reinforcement learning (RL) enables robots to learn skills from interactions with the real world. In practice, the unstructured step-based exploration used in Deep RL – often very successful in simulation – leads to jerky motion patterns on real robots. Consequences of the resulting shaky behavior are poor exploration, or even damage to the robot. We address these issues by adapting state-dependent exploration (SDE) to current Deep RL algorithms. To enable this adaptation, we propose two extensions to the original SDE, using more general features and re-sampling the noise periodically, which leads to a new exploration method generalized state-dependent exploration (gSDE). We evaluate gSDE both in simulation, on PyBullet continuous control tasks, and directly on three different real robots: a tendon-driven elastic robot, a quadruped and an RC car. The noise sampling interval of gSDE enables a compromise between performance and smoothness, which allows training directly on the real robots without loss of performance.
APA
Raffin, A., Kober, J. & Stulp, F.. (2022). Smooth Exploration for Robotic Reinforcement Learning. Proceedings of the 5th Conference on Robot Learning, in Proceedings of Machine Learning Research 164:1634-1644 Available from https://proceedings.mlr.press/v164/raffin22a.html.

Related Material