Go Beyond Imagination: Maximizing Episodic Reachability with World Models

Yao Fu, Run Peng, Honglak Lee
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:10405-10420, 2023.

Abstract

Efficient exploration is a challenging topic in reinforcement learning, especially for sparse reward tasks. To deal with the reward sparsity, people commonly apply intrinsic rewards to motivate agents to explore the state space efficiently. In this paper, we introduce a new intrinsic reward design called GoBI - Go Beyond Imagination, which combines the traditional lifelong novelty motivation with an episodic intrinsic reward that is designed to maximize the stepwise reachability expansion. More specifically, we apply learned world models to generate predicted future states with random actions. States with more unique predictions that are not in episodic memory are assigned high intrinsic rewards. Our method greatly outperforms previous state-of-the-art methods on 12 of the most challenging Minigrid navigation tasks and improves the sample efficiency on locomotion tasks from DeepMind Control Suite.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-fu23c, title = {Go Beyond Imagination: Maximizing Episodic Reachability with World Models}, author = {Fu, Yao and Peng, Run and Lee, Honglak}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {10405--10420}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/fu23c/fu23c.pdf}, url = {https://proceedings.mlr.press/v202/fu23c.html}, abstract = {Efficient exploration is a challenging topic in reinforcement learning, especially for sparse reward tasks. To deal with the reward sparsity, people commonly apply intrinsic rewards to motivate agents to explore the state space efficiently. In this paper, we introduce a new intrinsic reward design called GoBI - Go Beyond Imagination, which combines the traditional lifelong novelty motivation with an episodic intrinsic reward that is designed to maximize the stepwise reachability expansion. More specifically, we apply learned world models to generate predicted future states with random actions. States with more unique predictions that are not in episodic memory are assigned high intrinsic rewards. Our method greatly outperforms previous state-of-the-art methods on 12 of the most challenging Minigrid navigation tasks and improves the sample efficiency on locomotion tasks from DeepMind Control Suite.} }
Endnote
%0 Conference Paper %T Go Beyond Imagination: Maximizing Episodic Reachability with World Models %A Yao Fu %A Run Peng %A Honglak Lee %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-fu23c %I PMLR %P 10405--10420 %U https://proceedings.mlr.press/v202/fu23c.html %V 202 %X Efficient exploration is a challenging topic in reinforcement learning, especially for sparse reward tasks. To deal with the reward sparsity, people commonly apply intrinsic rewards to motivate agents to explore the state space efficiently. In this paper, we introduce a new intrinsic reward design called GoBI - Go Beyond Imagination, which combines the traditional lifelong novelty motivation with an episodic intrinsic reward that is designed to maximize the stepwise reachability expansion. More specifically, we apply learned world models to generate predicted future states with random actions. States with more unique predictions that are not in episodic memory are assigned high intrinsic rewards. Our method greatly outperforms previous state-of-the-art methods on 12 of the most challenging Minigrid navigation tasks and improves the sample efficiency on locomotion tasks from DeepMind Control Suite.
APA
Fu, Y., Peng, R. & Lee, H.. (2023). Go Beyond Imagination: Maximizing Episodic Reachability with World Models. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:10405-10420 Available from https://proceedings.mlr.press/v202/fu23c.html.

Related Material