Neural Posterior Domain Randomization

Fabio Muratore, Theo Gruner, Florian Wiese, Boris Belousov, Michael Gienger, Jan Peters
Proceedings of the 5th Conference on Robot Learning, PMLR 164:1532-1542, 2022.

Abstract

Combining domain randomization and reinforcement learning is a widely used approach to obtain control policies that can bridge the gap between simulation and reality. However, existing methods make limiting assumptions on the form of the domain parameter distribution which prevents them from utilizing the full power of domain randomization. Typically, a restricted family of probability distributions (e.g., normal or uniform) is chosen a priori for every parameter. Furthermore, straightforward approaches based on deep learning require differentiable simulators, which are either not available or can only simulate a limited class of systems. Such rigid assumptions diminish the applicability of domain randomization in robotics. Building upon recently proposed neural likelihood-free inference methods, we introduce Neural Posterior Domain Randomization (NPDR), an algorithm that alternates between learning a policy from a randomized simulator and adapting the posterior distribution over the simulator’s parameters in a Bayesian fashion. Our approach only requires a parameterized simulator, coarse prior ranges, a policy (optionally with optimization routine), and a small set of real-world observations. Most importantly, the domain parameter distribution is not restricted to a specific family, parameters can be correlated, and the simulator does not have to be differentiable. We show that the presented method is able to efficiently adapt the posterior over the domain parameters to closer match the observed dynamics. Moreover, we demonstrate that NPDR can learn transferable policies using fewer real-world rollouts than comparable algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v164-muratore22a, title = {Neural Posterior Domain Randomization}, author = {Muratore, Fabio and Gruner, Theo and Wiese, Florian and Belousov, Boris and Gienger, Michael and Peters, Jan}, booktitle = {Proceedings of the 5th Conference on Robot Learning}, pages = {1532--1542}, year = {2022}, editor = {Faust, Aleksandra and Hsu, David and Neumann, Gerhard}, volume = {164}, series = {Proceedings of Machine Learning Research}, month = {08--11 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v164/muratore22a/muratore22a.pdf}, url = {https://proceedings.mlr.press/v164/muratore22a.html}, abstract = {Combining domain randomization and reinforcement learning is a widely used approach to obtain control policies that can bridge the gap between simulation and reality. However, existing methods make limiting assumptions on the form of the domain parameter distribution which prevents them from utilizing the full power of domain randomization. Typically, a restricted family of probability distributions (e.g., normal or uniform) is chosen a priori for every parameter. Furthermore, straightforward approaches based on deep learning require differentiable simulators, which are either not available or can only simulate a limited class of systems. Such rigid assumptions diminish the applicability of domain randomization in robotics. Building upon recently proposed neural likelihood-free inference methods, we introduce Neural Posterior Domain Randomization (NPDR), an algorithm that alternates between learning a policy from a randomized simulator and adapting the posterior distribution over the simulator’s parameters in a Bayesian fashion. Our approach only requires a parameterized simulator, coarse prior ranges, a policy (optionally with optimization routine), and a small set of real-world observations. Most importantly, the domain parameter distribution is not restricted to a specific family, parameters can be correlated, and the simulator does not have to be differentiable. We show that the presented method is able to efficiently adapt the posterior over the domain parameters to closer match the observed dynamics. Moreover, we demonstrate that NPDR can learn transferable policies using fewer real-world rollouts than comparable algorithms.} }
Endnote
%0 Conference Paper %T Neural Posterior Domain Randomization %A Fabio Muratore %A Theo Gruner %A Florian Wiese %A Boris Belousov %A Michael Gienger %A Jan Peters %B Proceedings of the 5th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2022 %E Aleksandra Faust %E David Hsu %E Gerhard Neumann %F pmlr-v164-muratore22a %I PMLR %P 1532--1542 %U https://proceedings.mlr.press/v164/muratore22a.html %V 164 %X Combining domain randomization and reinforcement learning is a widely used approach to obtain control policies that can bridge the gap between simulation and reality. However, existing methods make limiting assumptions on the form of the domain parameter distribution which prevents them from utilizing the full power of domain randomization. Typically, a restricted family of probability distributions (e.g., normal or uniform) is chosen a priori for every parameter. Furthermore, straightforward approaches based on deep learning require differentiable simulators, which are either not available or can only simulate a limited class of systems. Such rigid assumptions diminish the applicability of domain randomization in robotics. Building upon recently proposed neural likelihood-free inference methods, we introduce Neural Posterior Domain Randomization (NPDR), an algorithm that alternates between learning a policy from a randomized simulator and adapting the posterior distribution over the simulator’s parameters in a Bayesian fashion. Our approach only requires a parameterized simulator, coarse prior ranges, a policy (optionally with optimization routine), and a small set of real-world observations. Most importantly, the domain parameter distribution is not restricted to a specific family, parameters can be correlated, and the simulator does not have to be differentiable. We show that the presented method is able to efficiently adapt the posterior over the domain parameters to closer match the observed dynamics. Moreover, we demonstrate that NPDR can learn transferable policies using fewer real-world rollouts than comparable algorithms.
APA
Muratore, F., Gruner, T., Wiese, F., Belousov, B., Gienger, M. & Peters, J.. (2022). Neural Posterior Domain Randomization. Proceedings of the 5th Conference on Robot Learning, in Proceedings of Machine Learning Research 164:1532-1542 Available from https://proceedings.mlr.press/v164/muratore22a.html.

Related Material