Model-Based Reinforcement Learning via Latent-Space Collocation

Oleh Rybkin, Chuning Zhu, Anusha Nagabandi, Kostas Daniilidis, Igor Mordatch, Sergey Levine
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:9190-9201, 2021.

Abstract

The ability to plan into the future while utilizing only raw high-dimensional observations, such as images, can provide autonomous agents with broad and general capabilities. However, realistic tasks require performing temporally extended reasoning, and cannot be solved with only myopic, short-sighted planning. Recent work in model-based reinforcement learning (RL) has shown impressive results on tasks that require only short-horizon reasoning. In this work, we study how the long-horizon planning abilities can be improved with an algorithm that optimizes over sequences of states, rather than actions, which allows better credit assignment. To achieve this, we draw on the idea of collocation and adapt it to the image-based setting by leveraging probabilistic latent variable models, resulting in an algorithm that optimizes trajectories over latent variables. Our latent collocation method (LatCo) provides a general and effective visual planning approach, and significantly outperforms prior model-based approaches on challenging visual control tasks with sparse rewards and long-term goals. See the videos on the supplementary website \url{https://sites.google.com/view/latco-mbrl/.}

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-rybkin21b, title = {Model-Based Reinforcement Learning via Latent-Space Collocation}, author = {Rybkin, Oleh and Zhu, Chuning and Nagabandi, Anusha and Daniilidis, Kostas and Mordatch, Igor and Levine, Sergey}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {9190--9201}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/rybkin21b/rybkin21b.pdf}, url = {https://proceedings.mlr.press/v139/rybkin21b.html}, abstract = {The ability to plan into the future while utilizing only raw high-dimensional observations, such as images, can provide autonomous agents with broad and general capabilities. However, realistic tasks require performing temporally extended reasoning, and cannot be solved with only myopic, short-sighted planning. Recent work in model-based reinforcement learning (RL) has shown impressive results on tasks that require only short-horizon reasoning. In this work, we study how the long-horizon planning abilities can be improved with an algorithm that optimizes over sequences of states, rather than actions, which allows better credit assignment. To achieve this, we draw on the idea of collocation and adapt it to the image-based setting by leveraging probabilistic latent variable models, resulting in an algorithm that optimizes trajectories over latent variables. Our latent collocation method (LatCo) provides a general and effective visual planning approach, and significantly outperforms prior model-based approaches on challenging visual control tasks with sparse rewards and long-term goals. See the videos on the supplementary website \url{https://sites.google.com/view/latco-mbrl/.}} }
Endnote
%0 Conference Paper %T Model-Based Reinforcement Learning via Latent-Space Collocation %A Oleh Rybkin %A Chuning Zhu %A Anusha Nagabandi %A Kostas Daniilidis %A Igor Mordatch %A Sergey Levine %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-rybkin21b %I PMLR %P 9190--9201 %U https://proceedings.mlr.press/v139/rybkin21b.html %V 139 %X The ability to plan into the future while utilizing only raw high-dimensional observations, such as images, can provide autonomous agents with broad and general capabilities. However, realistic tasks require performing temporally extended reasoning, and cannot be solved with only myopic, short-sighted planning. Recent work in model-based reinforcement learning (RL) has shown impressive results on tasks that require only short-horizon reasoning. In this work, we study how the long-horizon planning abilities can be improved with an algorithm that optimizes over sequences of states, rather than actions, which allows better credit assignment. To achieve this, we draw on the idea of collocation and adapt it to the image-based setting by leveraging probabilistic latent variable models, resulting in an algorithm that optimizes trajectories over latent variables. Our latent collocation method (LatCo) provides a general and effective visual planning approach, and significantly outperforms prior model-based approaches on challenging visual control tasks with sparse rewards and long-term goals. See the videos on the supplementary website \url{https://sites.google.com/view/latco-mbrl/.}
APA
Rybkin, O., Zhu, C., Nagabandi, A., Daniilidis, K., Mordatch, I. & Levine, S.. (2021). Model-Based Reinforcement Learning via Latent-Space Collocation. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:9190-9201 Available from https://proceedings.mlr.press/v139/rybkin21b.html.

Related Material