Imagined Value Gradients: Model-Based Policy Optimization with Tranferable Latent Dynamics Models

Arunkumar Byravan, Jost Tobias Springenberg, Abbas Abdolmaleki, Roland Hafner, Michael Neunert, Thomas Lampe, Noah Siegel, Nicolas Heess, Martin Riedmiller
Proceedings of the Conference on Robot Learning, PMLR 100:566-589, 2020.

Abstract

Humans are masters at quickly learning many complex tasks, relying on an approximate understanding of the dynamics of their environments. In much the same way, we would like our learning agents to quickly adapt to new tasks. In this paper, we explore how model-based Reinforcement Learning (RL) can facilitate transfer to new tasks. We develop an algorithm that learns an action-conditional, predictive model of expected future observations, rewards and values from which a policy can be derived by following the gradient of the estimated value along imagined trajectories. We show how robust policy optimization can be achieved in robot manipulation tasks even with approximate models that are learned directly from vision and proprioception. We evaluate the efficacy of our approach in a transfer learning scenario, re-using previously learned models on tasks with different reward structures and visual distractors, and show a significant improvement in learning speed compared to strong off-policy baselines. Videos with results can be found at https://sites.google.com/view/ivg-corl19

Cite this Paper


BibTeX
@InProceedings{pmlr-v100-byravan20a, title = {Imagined Value Gradients: Model-Based Policy Optimization with Tranferable Latent Dynamics Models}, author = {Byravan, Arunkumar and Springenberg, Jost Tobias and Abdolmaleki, Abbas and Hafner, Roland and Neunert, Michael and Lampe, Thomas and Siegel, Noah and Heess, Nicolas and Riedmiller, Martin}, booktitle = {Proceedings of the Conference on Robot Learning}, pages = {566--589}, year = {2020}, editor = {Kaelbling, Leslie Pack and Kragic, Danica and Sugiura, Komei}, volume = {100}, series = {Proceedings of Machine Learning Research}, month = {30 Oct--01 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v100/byravan20a/byravan20a.pdf}, url = {https://proceedings.mlr.press/v100/byravan20a.html}, abstract = {Humans are masters at quickly learning many complex tasks, relying on an approximate understanding of the dynamics of their environments. In much the same way, we would like our learning agents to quickly adapt to new tasks. In this paper, we explore how model-based Reinforcement Learning (RL) can facilitate transfer to new tasks. We develop an algorithm that learns an action-conditional, predictive model of expected future observations, rewards and values from which a policy can be derived by following the gradient of the estimated value along imagined trajectories. We show how robust policy optimization can be achieved in robot manipulation tasks even with approximate models that are learned directly from vision and proprioception. We evaluate the efficacy of our approach in a transfer learning scenario, re-using previously learned models on tasks with different reward structures and visual distractors, and show a significant improvement in learning speed compared to strong off-policy baselines. Videos with results can be found at https://sites.google.com/view/ivg-corl19} }
Endnote
%0 Conference Paper %T Imagined Value Gradients: Model-Based Policy Optimization with Tranferable Latent Dynamics Models %A Arunkumar Byravan %A Jost Tobias Springenberg %A Abbas Abdolmaleki %A Roland Hafner %A Michael Neunert %A Thomas Lampe %A Noah Siegel %A Nicolas Heess %A Martin Riedmiller %B Proceedings of the Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2020 %E Leslie Pack Kaelbling %E Danica Kragic %E Komei Sugiura %F pmlr-v100-byravan20a %I PMLR %P 566--589 %U https://proceedings.mlr.press/v100/byravan20a.html %V 100 %X Humans are masters at quickly learning many complex tasks, relying on an approximate understanding of the dynamics of their environments. In much the same way, we would like our learning agents to quickly adapt to new tasks. In this paper, we explore how model-based Reinforcement Learning (RL) can facilitate transfer to new tasks. We develop an algorithm that learns an action-conditional, predictive model of expected future observations, rewards and values from which a policy can be derived by following the gradient of the estimated value along imagined trajectories. We show how robust policy optimization can be achieved in robot manipulation tasks even with approximate models that are learned directly from vision and proprioception. We evaluate the efficacy of our approach in a transfer learning scenario, re-using previously learned models on tasks with different reward structures and visual distractors, and show a significant improvement in learning speed compared to strong off-policy baselines. Videos with results can be found at https://sites.google.com/view/ivg-corl19
APA
Byravan, A., Springenberg, J.T., Abdolmaleki, A., Hafner, R., Neunert, M., Lampe, T., Siegel, N., Heess, N. & Riedmiller, M.. (2020). Imagined Value Gradients: Model-Based Policy Optimization with Tranferable Latent Dynamics Models. Proceedings of the Conference on Robot Learning, in Proceedings of Machine Learning Research 100:566-589 Available from https://proceedings.mlr.press/v100/byravan20a.html.

Related Material