Deep Black-Box Reinforcement Learning with Movement Primitives

Fabian Otto; Onur Celik; Hongyi Zhou; Hanna Ziesche; Vien Anh Ngo; Gerhard Neumann

Deep Black-Box Reinforcement Learning with Movement Primitives

Fabian Otto, Onur Celik, Hongyi Zhou, Hanna Ziesche, Vien Anh Ngo, Gerhard Neumann

Proceedings of The 6th Conference on Robot Learning, PMLR 205:1244-1265, 2023.

Abstract

Episode-based reinforcement learning (ERL) algorithms treat reinforcement learning (RL) as a black-box optimization problem where we learn to select a parameter vector of a controller, often represented as a movement primitive, for a given task descriptor called a context. ERL offers several distinct benefits in comparison to step-based RL. It generates smooth control trajectories, can handle non-Markovian reward definitions, and the resulting exploration in parameter space is well suited for solving sparse reward settings. Yet, the high dimensionality of the movement primitive parameters has so far hampered the effective use of deep RL methods. In this paper, we present a new algorithm for deep ERL. It is based on differentiable trust region layers, a successful on-policy deep RL algorithm. These layers allow us to specify trust regions for the policy update that are solved exactly for each state using convex optimization, which enables policies learning with the high precision required for the ERL. We compare our ERL algorithm to state-of-the-art step-based algorithms in many complex simulated robotic control tasks. In doing so, we investigate different reward formulations - dense, sparse, and non-Markovian. While step-based algorithms perform well only on dense rewards, ERL performs favorably on sparse and non-Markovian rewards. Moreover, our results show that the sparse and the non-Markovian rewards are also often better suited to define the desired behavior, allowing us to obtain considerably higher quality policies compared to step-based RL.

Cite this Paper

BibTeX


@InProceedings{pmlr-v205-otto23a,
  title = 	 {Deep Black-Box Reinforcement Learning with Movement Primitives},
  author =       {Otto, Fabian and Celik, Onur and Zhou, Hongyi and Ziesche, Hanna and Ngo, Vien Anh and Neumann, Gerhard},
  booktitle = 	 {Proceedings of The 6th Conference on Robot Learning},
  pages = 	 {1244--1265},
  year = 	 {2023},
  editor = 	 {Liu, Karen and Kulic, Dana and Ichnowski, Jeff},
  volume = 	 {205},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {14--18 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v205/otto23a/otto23a.pdf},
  url = 	 {https://proceedings.mlr.press/v205/otto23a.html},
  abstract = 	 {Episode-based reinforcement learning (ERL) algorithms treat reinforcement learning (RL) as a black-box optimization problem where we learn to select a parameter vector of a controller, often represented as a movement primitive, for a given task descriptor called a context. ERL offers several distinct benefits in comparison to step-based RL. It generates smooth control trajectories, can handle non-Markovian reward definitions, and the resulting exploration in parameter space is well suited for solving sparse reward settings. Yet, the high dimensionality of the movement primitive parameters has so far hampered the effective use of deep RL methods. In this paper, we present a new algorithm for deep ERL. It is based on differentiable trust region layers, a successful on-policy deep RL algorithm. These layers allow us to specify trust regions for the policy update that are solved exactly for each state using convex optimization, which enables policies learning with the high precision required for the ERL. We compare our ERL algorithm to state-of-the-art step-based algorithms in many complex simulated robotic control tasks. In doing so, we investigate different reward formulations - dense, sparse, and non-Markovian. While step-based algorithms perform well only on dense rewards, ERL performs favorably on sparse and non-Markovian rewards. Moreover, our results show that the sparse and the non-Markovian rewards are also often better suited to define the desired behavior, allowing us to obtain considerably higher quality policies compared to step-based RL.}
}

Endnote

%0 Conference Paper
%T Deep Black-Box Reinforcement Learning with Movement Primitives
%A Fabian Otto
%A Onur Celik
%A Hongyi Zhou
%A Hanna Ziesche
%A Vien Anh Ngo
%A Gerhard Neumann
%B Proceedings of The 6th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Karen Liu
%E Dana Kulic
%E Jeff Ichnowski	
%F pmlr-v205-otto23a
%I PMLR
%P 1244--1265
%U https://proceedings.mlr.press/v205/otto23a.html
%V 205
%X Episode-based reinforcement learning (ERL) algorithms treat reinforcement learning (RL) as a black-box optimization problem where we learn to select a parameter vector of a controller, often represented as a movement primitive, for a given task descriptor called a context. ERL offers several distinct benefits in comparison to step-based RL. It generates smooth control trajectories, can handle non-Markovian reward definitions, and the resulting exploration in parameter space is well suited for solving sparse reward settings. Yet, the high dimensionality of the movement primitive parameters has so far hampered the effective use of deep RL methods. In this paper, we present a new algorithm for deep ERL. It is based on differentiable trust region layers, a successful on-policy deep RL algorithm. These layers allow us to specify trust regions for the policy update that are solved exactly for each state using convex optimization, which enables policies learning with the high precision required for the ERL. We compare our ERL algorithm to state-of-the-art step-based algorithms in many complex simulated robotic control tasks. In doing so, we investigate different reward formulations - dense, sparse, and non-Markovian. While step-based algorithms perform well only on dense rewards, ERL performs favorably on sparse and non-Markovian rewards. Moreover, our results show that the sparse and the non-Markovian rewards are also often better suited to define the desired behavior, allowing us to obtain considerably higher quality policies compared to step-based RL.

APA


Otto, F., Celik, O., Zhou, H., Ziesche, H., Ngo, V.A. & Neumann, G.. (2023). Deep Black-Box Reinforcement Learning with Movement Primitives. Proceedings of The 6th Conference on Robot Learning, in Proceedings of Machine Learning Research 205:1244-1265 Available from https://proceedings.mlr.press/v205/otto23a.html.

Deep Black-Box Reinforcement Learning with Movement Primitives

Abstract

Cite this Paper

Related Material