Optimizing Long-term Predictions for Model-based Policy Search

Andreas Doerr, CHristian Daniel, Duy Nguyen-Tuong, Alonso Marco, Stefan Schaal, Toussaint Marc, Sebastian Trimpe
Proceedings of the 1st Annual Conference on Robot Learning, PMLR 78:227-238, 2017.

Abstract

We propose a novel long-term optimization criterion to improve the robustness of model-based reinforcement learning in real-world scenarios. Learning a dynamics model to derive a solution promises much greater data-efficiency and reusability compared to model-free alternatives. In practice, however, modelbased RL suffers from various imperfections such as noisy input and output data, delays and unmeasured (latent) states. To achieve higher resilience against such effects, we propose to optimize a generative long-term prediction model directly with respect to the likelihood of observed trajectories as opposed to the common approach of optimizing a dynamics model for one-step-ahead predictions. We evaluate the proposed method on several artificial and real-world benchmark problems and compare it to PILCO, a model-based RL framework, in experiments on a manipulation robot. The results show that the proposed method is competitive compared to state-of-the-art model learning methods. In contrast to these more involved models, our model can directly be employed for policy search and outperforms a baseline method in the robot experiment.

Cite this Paper


BibTeX
@InProceedings{pmlr-v78-doerr17a, title = {Optimizing Long-term Predictions for Model-based Policy Search}, author = {Doerr, Andreas and Daniel, CHristian and Nguyen-Tuong, Duy and Marco, Alonso and Schaal, Stefan and Marc, Toussaint and Trimpe, Sebastian}, booktitle = {Proceedings of the 1st Annual Conference on Robot Learning}, pages = {227--238}, year = {2017}, editor = {Levine, Sergey and Vanhoucke, Vincent and Goldberg, Ken}, volume = {78}, series = {Proceedings of Machine Learning Research}, month = {13--15 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v78/doerr17a/doerr17a.pdf}, url = {https://proceedings.mlr.press/v78/doerr17a.html}, abstract = { We propose a novel long-term optimization criterion to improve the robustness of model-based reinforcement learning in real-world scenarios. Learning a dynamics model to derive a solution promises much greater data-efficiency and reusability compared to model-free alternatives. In practice, however, modelbased RL suffers from various imperfections such as noisy input and output data, delays and unmeasured (latent) states. To achieve higher resilience against such effects, we propose to optimize a generative long-term prediction model directly with respect to the likelihood of observed trajectories as opposed to the common approach of optimizing a dynamics model for one-step-ahead predictions. We evaluate the proposed method on several artificial and real-world benchmark problems and compare it to PILCO, a model-based RL framework, in experiments on a manipulation robot. The results show that the proposed method is competitive compared to state-of-the-art model learning methods. In contrast to these more involved models, our model can directly be employed for policy search and outperforms a baseline method in the robot experiment.} }
Endnote
%0 Conference Paper %T Optimizing Long-term Predictions for Model-based Policy Search %A Andreas Doerr %A CHristian Daniel %A Duy Nguyen-Tuong %A Alonso Marco %A Stefan Schaal %A Toussaint Marc %A Sebastian Trimpe %B Proceedings of the 1st Annual Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2017 %E Sergey Levine %E Vincent Vanhoucke %E Ken Goldberg %F pmlr-v78-doerr17a %I PMLR %P 227--238 %U https://proceedings.mlr.press/v78/doerr17a.html %V 78 %X We propose a novel long-term optimization criterion to improve the robustness of model-based reinforcement learning in real-world scenarios. Learning a dynamics model to derive a solution promises much greater data-efficiency and reusability compared to model-free alternatives. In practice, however, modelbased RL suffers from various imperfections such as noisy input and output data, delays and unmeasured (latent) states. To achieve higher resilience against such effects, we propose to optimize a generative long-term prediction model directly with respect to the likelihood of observed trajectories as opposed to the common approach of optimizing a dynamics model for one-step-ahead predictions. We evaluate the proposed method on several artificial and real-world benchmark problems and compare it to PILCO, a model-based RL framework, in experiments on a manipulation robot. The results show that the proposed method is competitive compared to state-of-the-art model learning methods. In contrast to these more involved models, our model can directly be employed for policy search and outperforms a baseline method in the robot experiment.
APA
Doerr, A., Daniel, C., Nguyen-Tuong, D., Marco, A., Schaal, S., Marc, T. & Trimpe, S.. (2017). Optimizing Long-term Predictions for Model-based Policy Search. Proceedings of the 1st Annual Conference on Robot Learning, in Proceedings of Machine Learning Research 78:227-238 Available from https://proceedings.mlr.press/v78/doerr17a.html.

Related Material