High Acceleration Reinforcement Learning for Real-World Juggling with Binary Rewards

Kai Ploeger; Michael Lutter; Jan Peters

High Acceleration Reinforcement Learning for Real-World Juggling with Binary Rewards

Kai Ploeger, Michael Lutter, Jan Peters

Proceedings of the 2020 Conference on Robot Learning, PMLR 155:642-653, 2021.

Abstract

Robots that can learn in the physical world will be important to enable robots to escape their stiff and pre-programmed movements. For dynamic high-acceleration tasks, such as juggling, learning in the real-world is particularly challenging as one must push the limits of the robot and its actuation without harming the system, amplifying the necessity of sample efficiency and safety for robot learning algorithms. In contrast to prior work which mainly focuses on the learning algorithm, we propose a learning system, that directly incorporates these requirements in the design of the policy representation, initialization, and optimization. We demonstrate that this system enables the high-speed Barrett WAM manipulator to learn juggling two balls from 56 minutes of experience with a binary reward signal and finally juggles continuously for up to 33 minutes or about 4500 repeated catches. The videos documenting the learning process and the evaluation can be found at https://sites.google.com/view/jugglingbot

Cite this Paper

BibTeX


@InProceedings{pmlr-v155-ploeger21a,
  title = 	 {High Acceleration Reinforcement Learning for Real-World Juggling with Binary Rewards},
  author =       {Ploeger, Kai and Lutter, Michael and Peters, Jan},
  booktitle = 	 {Proceedings of the 2020 Conference on Robot Learning},
  pages = 	 {642--653},
  year = 	 {2021},
  editor = 	 {Kober, Jens and Ramos, Fabio and Tomlin, Claire},
  volume = 	 {155},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--18 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v155/ploeger21a/ploeger21a.pdf},
  url = 	 {https://proceedings.mlr.press/v155/ploeger21a.html},
  abstract = 	 {Robots that can learn in the physical world will be important to enable robots to escape their stiff and pre-programmed movements.  For dynamic high-acceleration tasks, such as juggling, learning in the real-world is particularly challenging  as  one  must  push  the  limits  of  the  robot  and  its  actuation  without harming the system, amplifying the necessity of sample efficiency and safety for robot learning algorithms.  In contrast to prior work which mainly focuses on the learning algorithm, we propose a learning system, that directly incorporates these requirements in the design of the policy representation,  initialization,  and optimization.  We demonstrate that this system enables the high-speed Barrett WAM manipulator to learn juggling two balls from 56 minutes of experience with a binary reward signal and finally juggles continuously for up to 33 minutes or about 4500 repeated catches. The videos documenting the learning process and the evaluation can be found at https://sites.google.com/view/jugglingbot}
}

Endnote

%0 Conference Paper
%T High Acceleration Reinforcement Learning for Real-World Juggling with Binary Rewards
%A Kai Ploeger
%A Michael Lutter
%A Jan Peters
%B Proceedings of the 2020 Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Jens Kober
%E Fabio Ramos
%E Claire Tomlin	
%F pmlr-v155-ploeger21a
%I PMLR
%P 642--653
%U https://proceedings.mlr.press/v155/ploeger21a.html
%V 155
%X Robots that can learn in the physical world will be important to enable robots to escape their stiff and pre-programmed movements.  For dynamic high-acceleration tasks, such as juggling, learning in the real-world is particularly challenging  as  one  must  push  the  limits  of  the  robot  and  its  actuation  without harming the system, amplifying the necessity of sample efficiency and safety for robot learning algorithms.  In contrast to prior work which mainly focuses on the learning algorithm, we propose a learning system, that directly incorporates these requirements in the design of the policy representation,  initialization,  and optimization.  We demonstrate that this system enables the high-speed Barrett WAM manipulator to learn juggling two balls from 56 minutes of experience with a binary reward signal and finally juggles continuously for up to 33 minutes or about 4500 repeated catches. The videos documenting the learning process and the evaluation can be found at https://sites.google.com/view/jugglingbot

APA


Ploeger, K., Lutter, M. & Peters, J.. (2021). High Acceleration Reinforcement Learning for Real-World Juggling with Binary Rewards. Proceedings of the 2020 Conference on Robot Learning, in Proceedings of Machine Learning Research 155:642-653 Available from https://proceedings.mlr.press/v155/ploeger21a.html.

Related Material

Download PDF