Uncertainty-driven Imagination for Continuous Deep Reinforcement Learning

Gabriel Kalweit, Joschka Boedecker
Proceedings of the 1st Annual Conference on Robot Learning, PMLR 78:195-206, 2017.

Abstract

Continuous control of high-dimensional systems can be achieved by current state-of-the-art reinforcement learning methods such as the Deep Deterministic Policy Gradient algorithm, but needs a significant amount of data samples. For real-world systems, this can be an obstacle since excessive data collection can be expensive, tedious or lead to physical damage. The main incentive of this work is to keep the advantages of model-free Q-learning while minimizing real-world interaction by the employment of a dynamics model learned in parallel. To counteract adverse effects of imaginary rollouts with an inaccurate model, a notion of uncertainty is introduced, to make use of artificial data only in cases of high uncertainty. We evaluate our approach on three simulated robot tasks and achieve faster learning by at least 40 per cent in comparison to vanilla DDPG with multiple updates.

Cite this Paper


BibTeX
@InProceedings{pmlr-v78-kalweit17a, title = {Uncertainty-driven Imagination for Continuous Deep Reinforcement Learning}, author = {Kalweit, Gabriel and Boedecker, Joschka}, booktitle = {Proceedings of the 1st Annual Conference on Robot Learning}, pages = {195--206}, year = {2017}, editor = {Levine, Sergey and Vanhoucke, Vincent and Goldberg, Ken}, volume = {78}, series = {Proceedings of Machine Learning Research}, month = {13--15 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v78/kalweit17a/kalweit17a.pdf}, url = {https://proceedings.mlr.press/v78/kalweit17a.html}, abstract = {Continuous control of high-dimensional systems can be achieved by current state-of-the-art reinforcement learning methods such as the Deep Deterministic Policy Gradient algorithm, but needs a significant amount of data samples. For real-world systems, this can be an obstacle since excessive data collection can be expensive, tedious or lead to physical damage. The main incentive of this work is to keep the advantages of model-free Q-learning while minimizing real-world interaction by the employment of a dynamics model learned in parallel. To counteract adverse effects of imaginary rollouts with an inaccurate model, a notion of uncertainty is introduced, to make use of artificial data only in cases of high uncertainty. We evaluate our approach on three simulated robot tasks and achieve faster learning by at least 40 per cent in comparison to vanilla DDPG with multiple updates.} }
Endnote
%0 Conference Paper %T Uncertainty-driven Imagination for Continuous Deep Reinforcement Learning %A Gabriel Kalweit %A Joschka Boedecker %B Proceedings of the 1st Annual Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2017 %E Sergey Levine %E Vincent Vanhoucke %E Ken Goldberg %F pmlr-v78-kalweit17a %I PMLR %P 195--206 %U https://proceedings.mlr.press/v78/kalweit17a.html %V 78 %X Continuous control of high-dimensional systems can be achieved by current state-of-the-art reinforcement learning methods such as the Deep Deterministic Policy Gradient algorithm, but needs a significant amount of data samples. For real-world systems, this can be an obstacle since excessive data collection can be expensive, tedious or lead to physical damage. The main incentive of this work is to keep the advantages of model-free Q-learning while minimizing real-world interaction by the employment of a dynamics model learned in parallel. To counteract adverse effects of imaginary rollouts with an inaccurate model, a notion of uncertainty is introduced, to make use of artificial data only in cases of high uncertainty. We evaluate our approach on three simulated robot tasks and achieve faster learning by at least 40 per cent in comparison to vanilla DDPG with multiple updates.
APA
Kalweit, G. & Boedecker, J.. (2017). Uncertainty-driven Imagination for Continuous Deep Reinforcement Learning. Proceedings of the 1st Annual Conference on Robot Learning, in Proceedings of Machine Learning Research 78:195-206 Available from https://proceedings.mlr.press/v78/kalweit17a.html.

Related Material