Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards

Rituraj Kaushik; Konstantinos Chatzilygeroudis; Jean-Baptiste Mouret

Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards

Rituraj Kaushik, Konstantinos Chatzilygeroudis, Jean-Baptiste Mouret

Proceedings of The 2nd Conference on Robot Learning, PMLR 87:839-855, 2018.

Abstract

The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. However, the current algorithms lack an effective exploration strategy to deal with sparse or misleading reward scenarios: if they do not experience any state with a positive reward during the initial random exploration, it is very unlikely to solve the problem. Here, we propose a novel model-based policy search algorithm, Multi-DEX, that leverages a learned dynamical model to efficiently explore the task space and solve tasks with sparse rewards in a few episodes. To achieve this, we frame the policy search problem as a multi-objective, model-based policy optimization problem with three objectives: (1) generate maximally novel state trajectories, (2) maximize the cumulative reward and (3) keep the system in state-space regions for which the model is as accurate as possible. We then optimize these objectives using a Pareto-based multi-objective optimization algorithm. The experiments show that Multi-DEX is able to solve sparse reward scenarios (with a simulated robotic arm) in much lower interaction time than VIME, TRPO, GEP-PG, CMA-ES and Black-DROPS.

Cite this Paper

BibTeX


@InProceedings{pmlr-v87-kaushik18a,
  title = 	 {Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards},
  author =       {Kaushik, Rituraj and Chatzilygeroudis, Konstantinos and Mouret, Jean-Baptiste},
  booktitle = 	 {Proceedings of The 2nd Conference on Robot Learning},
  pages = 	 {839--855},
  year = 	 {2018},
  editor = 	 {Billard, Aude and Dragan, Anca and Peters, Jan and Morimoto, Jun},
  volume = 	 {87},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {29--31 Oct},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v87/kaushik18a/kaushik18a.pdf},
  url = 	 {https://proceedings.mlr.press/v87/kaushik18a.html},
  abstract = 	 {The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. However, the current algorithms lack an effective exploration strategy to deal with sparse or misleading reward scenarios: if they do not experience any state with a positive reward during the initial random exploration, it is very unlikely to solve the problem. Here, we propose a novel model-based policy search algorithm, Multi-DEX, that leverages a learned dynamical model to efficiently explore the task space and solve tasks with sparse rewards in a few episodes. To achieve this, we frame the policy search problem as a multi-objective, model-based policy optimization problem with three objectives: (1) generate maximally novel state trajectories, (2) maximize the cumulative reward and (3) keep the system in state-space regions for which the model is as accurate as possible. We then optimize these objectives using a Pareto-based multi-objective optimization algorithm. The experiments show that Multi-DEX is able to solve sparse reward scenarios (with a simulated robotic arm) in much lower interaction time than VIME, TRPO, GEP-PG, CMA-ES and Black-DROPS. }
}

Endnote

%0 Conference Paper
%T Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards
%A Rituraj Kaushik
%A Konstantinos Chatzilygeroudis
%A Jean-Baptiste Mouret
%B Proceedings of The 2nd Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Aude Billard
%E Anca Dragan
%E Jan Peters
%E Jun Morimoto	
%F pmlr-v87-kaushik18a
%I PMLR
%P 839--855
%U https://proceedings.mlr.press/v87/kaushik18a.html
%V 87
%X The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. However, the current algorithms lack an effective exploration strategy to deal with sparse or misleading reward scenarios: if they do not experience any state with a positive reward during the initial random exploration, it is very unlikely to solve the problem. Here, we propose a novel model-based policy search algorithm, Multi-DEX, that leverages a learned dynamical model to efficiently explore the task space and solve tasks with sparse rewards in a few episodes. To achieve this, we frame the policy search problem as a multi-objective, model-based policy optimization problem with three objectives: (1) generate maximally novel state trajectories, (2) maximize the cumulative reward and (3) keep the system in state-space regions for which the model is as accurate as possible. We then optimize these objectives using a Pareto-based multi-objective optimization algorithm. The experiments show that Multi-DEX is able to solve sparse reward scenarios (with a simulated robotic arm) in much lower interaction time than VIME, TRPO, GEP-PG, CMA-ES and Black-DROPS.

APA


Kaushik, R., Chatzilygeroudis, K. & Mouret, J.. (2018). Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards. Proceedings of The 2nd Conference on Robot Learning, in Proceedings of Machine Learning Research 87:839-855 Available from https://proceedings.mlr.press/v87/kaushik18a.html.

Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards

Abstract

Cite this Paper

Related Material