Domain Randomization for Simulation-Based Policy Optimization with Transferability Assessment

Fabio Muratore; Felix Treede; Michael Gienger; Jan Peters

Domain Randomization for Simulation-Based Policy Optimization with Transferability Assessment

Fabio Muratore, Felix Treede, Michael Gienger, Jan Peters

Proceedings of The 2nd Conference on Robot Learning, PMLR 87:700-713, 2018.

Abstract

Exploration-based reinforcement learning on real robot systems is generally time-intensive and can lead to catastrophic robot failures. Therefore, simulation-based policy search appears to be an appealing alternative. Unfor- tunately, running policy search on a slightly faulty simulator can easily lead to the maximization of the ‘Simulation Optimization Bias’ (SOB), where the policy exploits modeling errors of the simulator such that the resulting behavior can potentially damage the robot. For this reason, much work in robot reinforcement learning has focused on model-free methods that learn on real-world systems. The resulting lack of safe simulation-based policy learning techniques imposes severe limitations on the application of robot reinforcement learning. In this paper, we explore how physics simulations can be utilized for a robust policy optimization by perturbing the simulator’s parameters and training from model ensembles. We propose a new algorithm called Simulation-based Policy Optimization with Transferability Assessment (SPOTA) that uses a biased estimator of the SOB to formulate a stopping criterion for training. We show that the new simulation-based policy search algorithm is able to learn a control policy exclusively from a randomized simulator that can be applied directly to a different system without using any data from the latter.

Cite this Paper

BibTeX


@InProceedings{pmlr-v87-muratore18a,
  title = 	 {Domain Randomization for Simulation-Based Policy Optimization with Transferability Assessment},
  author =       {Muratore, Fabio and Treede, Felix and Gienger, Michael and Peters, Jan},
  booktitle = 	 {Proceedings of The 2nd Conference on Robot Learning},
  pages = 	 {700--713},
  year = 	 {2018},
  editor = 	 {Billard, Aude and Dragan, Anca and Peters, Jan and Morimoto, Jun},
  volume = 	 {87},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {29--31 Oct},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v87/muratore18a/muratore18a.pdf},
  url = 	 {https://proceedings.mlr.press/v87/muratore18a.html},
  abstract = 	 {Exploration-based reinforcement learning on real robot systems is generally time-intensive and can lead to catastrophic robot failures. Therefore, simulation-based policy search appears to be an appealing alternative. Unfor- tunately, running policy search on a slightly faulty simulator can easily lead to the maximization of the ‘Simulation Optimization Bias’ (SOB), where the policy exploits modeling errors of the simulator such that the resulting behavior can potentially damage the robot. For this reason, much work in robot reinforcement learning has focused on model-free methods that learn on real-world systems. The resulting lack of safe simulation-based policy learning techniques imposes severe limitations on the application of robot reinforcement learning. In this paper, we explore how physics simulations can be utilized for a robust policy optimization by perturbing the simulator’s parameters and training from model ensembles. We propose a new algorithm called Simulation-based Policy Optimization with Transferability Assessment (SPOTA) that uses a biased estimator of the SOB to formulate a stopping criterion for training. We show that the new simulation-based policy search algorithm is able to learn a control policy exclusively from a randomized simulator that can be applied directly to a different system without using any data from the latter. }
}

Endnote

%0 Conference Paper
%T Domain Randomization for Simulation-Based Policy Optimization with Transferability Assessment
%A Fabio Muratore
%A Felix Treede
%A Michael Gienger
%A Jan Peters
%B Proceedings of The 2nd Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Aude Billard
%E Anca Dragan
%E Jan Peters
%E Jun Morimoto	
%F pmlr-v87-muratore18a
%I PMLR
%P 700--713
%U https://proceedings.mlr.press/v87/muratore18a.html
%V 87
%X Exploration-based reinforcement learning on real robot systems is generally time-intensive and can lead to catastrophic robot failures. Therefore, simulation-based policy search appears to be an appealing alternative. Unfor- tunately, running policy search on a slightly faulty simulator can easily lead to the maximization of the ‘Simulation Optimization Bias’ (SOB), where the policy exploits modeling errors of the simulator such that the resulting behavior can potentially damage the robot. For this reason, much work in robot reinforcement learning has focused on model-free methods that learn on real-world systems. The resulting lack of safe simulation-based policy learning techniques imposes severe limitations on the application of robot reinforcement learning. In this paper, we explore how physics simulations can be utilized for a robust policy optimization by perturbing the simulator’s parameters and training from model ensembles. We propose a new algorithm called Simulation-based Policy Optimization with Transferability Assessment (SPOTA) that uses a biased estimator of the SOB to formulate a stopping criterion for training. We show that the new simulation-based policy search algorithm is able to learn a control policy exclusively from a randomized simulator that can be applied directly to a different system without using any data from the latter.

APA


Muratore, F., Treede, F., Gienger, M. & Peters, J.. (2018). Domain Randomization for Simulation-Based Policy Optimization with Transferability Assessment. Proceedings of The 2nd Conference on Robot Learning, in Proceedings of Machine Learning Research 87:700-713 Available from https://proceedings.mlr.press/v87/muratore18a.html.

Domain Randomization for Simulation-Based Policy Optimization with Transferability Assessment

Abstract

Cite this Paper

Related Material