Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning

Yevgen Chebotar, Karol Hausman, Marvin Zhang, Gaurav Sukhatme, Stefan Schaal, Sergey Levine
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:703-711, 2017.

Abstract

Reinforcement learning algorithms for real-world robotic applications must be able to handle complex, unknown dynamical systems while maintaining data-efficient learning. These requirements are handled well by model-free and model-based RL approaches, respectively. In this work, we aim to combine the advantages of these approaches. By focusing on time-varying linear-Gaussian policies, we enable a model-based algorithm based on the linear-quadratic regulator that can be integrated into the model-free framework of path integral policy improvement. We can further combine our method with guided policy search to train arbitrary parameterized policies such as deep neural networks. Our simulation and real-world experiments demonstrate that this method can solve challenging manipulation tasks with comparable or better performance than model-free methods while maintaining the sample efficiency of model-based methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-chebotar17a, title = {Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning}, author = {Yevgen Chebotar and Karol Hausman and Marvin Zhang and Gaurav Sukhatme and Stefan Schaal and Sergey Levine}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {703--711}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/chebotar17a/chebotar17a.pdf}, url = {https://proceedings.mlr.press/v70/chebotar17a.html}, abstract = {Reinforcement learning algorithms for real-world robotic applications must be able to handle complex, unknown dynamical systems while maintaining data-efficient learning. These requirements are handled well by model-free and model-based RL approaches, respectively. In this work, we aim to combine the advantages of these approaches. By focusing on time-varying linear-Gaussian policies, we enable a model-based algorithm based on the linear-quadratic regulator that can be integrated into the model-free framework of path integral policy improvement. We can further combine our method with guided policy search to train arbitrary parameterized policies such as deep neural networks. Our simulation and real-world experiments demonstrate that this method can solve challenging manipulation tasks with comparable or better performance than model-free methods while maintaining the sample efficiency of model-based methods.} }
Endnote
%0 Conference Paper %T Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning %A Yevgen Chebotar %A Karol Hausman %A Marvin Zhang %A Gaurav Sukhatme %A Stefan Schaal %A Sergey Levine %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-chebotar17a %I PMLR %P 703--711 %U https://proceedings.mlr.press/v70/chebotar17a.html %V 70 %X Reinforcement learning algorithms for real-world robotic applications must be able to handle complex, unknown dynamical systems while maintaining data-efficient learning. These requirements are handled well by model-free and model-based RL approaches, respectively. In this work, we aim to combine the advantages of these approaches. By focusing on time-varying linear-Gaussian policies, we enable a model-based algorithm based on the linear-quadratic regulator that can be integrated into the model-free framework of path integral policy improvement. We can further combine our method with guided policy search to train arbitrary parameterized policies such as deep neural networks. Our simulation and real-world experiments demonstrate that this method can solve challenging manipulation tasks with comparable or better performance than model-free methods while maintaining the sample efficiency of model-based methods.
APA
Chebotar, Y., Hausman, K., Zhang, M., Sukhatme, G., Schaal, S. & Levine, S.. (2017). Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:703-711 Available from https://proceedings.mlr.press/v70/chebotar17a.html.

Related Material