Regularizing Model-Based Planning with Energy-Based Models

Rinu Boney, Juho Kannala, Alexander Ilin
Proceedings of the Conference on Robot Learning, PMLR 100:182-191, 2020.

Abstract

Model-based reinforcement learning could enable sample-efficient learning by quickly acquiring rich knowledge about the world and using it to improve behaviour without additional data. Learned dynamics models can be directly used for planning actions but this has been challenging because of inaccuracies in the learned models. In this paper, we focus on planning with learned dynamics models and propose to regularize it using energy estimates of state transitions in the environment. We visually demonstrate the effectiveness of the proposed method and show that off-policy training of an energy estimator can be effectively used to regularize planning with pre-trained dynamics models. Further, we demonstrate that the proposed method enables sample-efficient learning to achieve competitive performance in challenging continuous control tasks such as Half-cheetah and Ant in just a few minutes of experience.

Cite this Paper


BibTeX
@InProceedings{pmlr-v100-boney20a, title = {Regularizing Model-Based Planning with Energy-Based Models}, author = {Boney, Rinu and Kannala, Juho and Ilin, Alexander}, booktitle = {Proceedings of the Conference on Robot Learning}, pages = {182--191}, year = {2020}, editor = {Kaelbling, Leslie Pack and Kragic, Danica and Sugiura, Komei}, volume = {100}, series = {Proceedings of Machine Learning Research}, month = {30 Oct--01 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v100/boney20a/boney20a.pdf}, url = {https://proceedings.mlr.press/v100/boney20a.html}, abstract = {Model-based reinforcement learning could enable sample-efficient learning by quickly acquiring rich knowledge about the world and using it to improve behaviour without additional data. Learned dynamics models can be directly used for planning actions but this has been challenging because of inaccuracies in the learned models. In this paper, we focus on planning with learned dynamics models and propose to regularize it using energy estimates of state transitions in the environment. We visually demonstrate the effectiveness of the proposed method and show that off-policy training of an energy estimator can be effectively used to regularize planning with pre-trained dynamics models. Further, we demonstrate that the proposed method enables sample-efficient learning to achieve competitive performance in challenging continuous control tasks such as Half-cheetah and Ant in just a few minutes of experience.} }
Endnote
%0 Conference Paper %T Regularizing Model-Based Planning with Energy-Based Models %A Rinu Boney %A Juho Kannala %A Alexander Ilin %B Proceedings of the Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2020 %E Leslie Pack Kaelbling %E Danica Kragic %E Komei Sugiura %F pmlr-v100-boney20a %I PMLR %P 182--191 %U https://proceedings.mlr.press/v100/boney20a.html %V 100 %X Model-based reinforcement learning could enable sample-efficient learning by quickly acquiring rich knowledge about the world and using it to improve behaviour without additional data. Learned dynamics models can be directly used for planning actions but this has been challenging because of inaccuracies in the learned models. In this paper, we focus on planning with learned dynamics models and propose to regularize it using energy estimates of state transitions in the environment. We visually demonstrate the effectiveness of the proposed method and show that off-policy training of an energy estimator can be effectively used to regularize planning with pre-trained dynamics models. Further, we demonstrate that the proposed method enables sample-efficient learning to achieve competitive performance in challenging continuous control tasks such as Half-cheetah and Ant in just a few minutes of experience.
APA
Boney, R., Kannala, J. & Ilin, A.. (2020). Regularizing Model-Based Planning with Energy-Based Models. Proceedings of the Conference on Robot Learning, in Proceedings of Machine Learning Research 100:182-191 Available from https://proceedings.mlr.press/v100/boney20a.html.

Related Material