Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards

Rituraj Kaushik, Konstantinos Chatzilygeroudis, Jean-Baptiste Mouret
Proceedings of The 2nd Conference on Robot Learning, PMLR 87:839-855, 2018.

Abstract

The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. However, the current algorithms lack an effective exploration strategy to deal with sparse or misleading reward scenarios: if they do not experience any state with a positive reward during the initial random exploration, it is very unlikely to solve the problem. Here, we propose a novel model-based policy search algorithm, Multi-DEX, that leverages a learned dynamical model to efficiently explore the task space and solve tasks with sparse rewards in a few episodes. To achieve this, we frame the policy search problem as a multi-objective, model-based policy optimization problem with three objectives: (1) generate maximally novel state trajectories, (2) maximize the cumulative reward and (3) keep the system in state-space regions for which the model is as accurate as possible. We then optimize these objectives using a Pareto-based multi-objective optimization algorithm. The experiments show that Multi-DEX is able to solve sparse reward scenarios (with a simulated robotic arm) in much lower interaction time than VIME, TRPO, GEP-PG, CMA-ES and Black-DROPS.

Cite this Paper


BibTeX
@InProceedings{pmlr-v87-kaushik18a, title = {Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards}, author = {Kaushik, Rituraj and Chatzilygeroudis, Konstantinos and Mouret, Jean-Baptiste}, booktitle = {Proceedings of The 2nd Conference on Robot Learning}, pages = {839--855}, year = {2018}, editor = {Billard, Aude and Dragan, Anca and Peters, Jan and Morimoto, Jun}, volume = {87}, series = {Proceedings of Machine Learning Research}, month = {29--31 Oct}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v87/kaushik18a/kaushik18a.pdf}, url = {https://proceedings.mlr.press/v87/kaushik18a.html}, abstract = {The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. However, the current algorithms lack an effective exploration strategy to deal with sparse or misleading reward scenarios: if they do not experience any state with a positive reward during the initial random exploration, it is very unlikely to solve the problem. Here, we propose a novel model-based policy search algorithm, Multi-DEX, that leverages a learned dynamical model to efficiently explore the task space and solve tasks with sparse rewards in a few episodes. To achieve this, we frame the policy search problem as a multi-objective, model-based policy optimization problem with three objectives: (1) generate maximally novel state trajectories, (2) maximize the cumulative reward and (3) keep the system in state-space regions for which the model is as accurate as possible. We then optimize these objectives using a Pareto-based multi-objective optimization algorithm. The experiments show that Multi-DEX is able to solve sparse reward scenarios (with a simulated robotic arm) in much lower interaction time than VIME, TRPO, GEP-PG, CMA-ES and Black-DROPS. } }
Endnote
%0 Conference Paper %T Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards %A Rituraj Kaushik %A Konstantinos Chatzilygeroudis %A Jean-Baptiste Mouret %B Proceedings of The 2nd Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2018 %E Aude Billard %E Anca Dragan %E Jan Peters %E Jun Morimoto %F pmlr-v87-kaushik18a %I PMLR %P 839--855 %U https://proceedings.mlr.press/v87/kaushik18a.html %V 87 %X The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. However, the current algorithms lack an effective exploration strategy to deal with sparse or misleading reward scenarios: if they do not experience any state with a positive reward during the initial random exploration, it is very unlikely to solve the problem. Here, we propose a novel model-based policy search algorithm, Multi-DEX, that leverages a learned dynamical model to efficiently explore the task space and solve tasks with sparse rewards in a few episodes. To achieve this, we frame the policy search problem as a multi-objective, model-based policy optimization problem with three objectives: (1) generate maximally novel state trajectories, (2) maximize the cumulative reward and (3) keep the system in state-space regions for which the model is as accurate as possible. We then optimize these objectives using a Pareto-based multi-objective optimization algorithm. The experiments show that Multi-DEX is able to solve sparse reward scenarios (with a simulated robotic arm) in much lower interaction time than VIME, TRPO, GEP-PG, CMA-ES and Black-DROPS.
APA
Kaushik, R., Chatzilygeroudis, K. & Mouret, J.. (2018). Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards. Proceedings of The 2nd Conference on Robot Learning, in Proceedings of Machine Learning Research 87:839-855 Available from https://proceedings.mlr.press/v87/kaushik18a.html.

Related Material