MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Jin Zhang, Jianhao Wang, Hao Hu, Tong Chen, Yingfeng Chen, Changjie Fan, Chongjie Zhang
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:12600-12610, 2021.

Abstract

Meta reinforcement learning (meta-RL) extracts knowledge from previous tasks and achieves fast adaptation to new tasks. Despite recent progress, efficient exploration in meta-RL remains a key challenge in sparse-reward tasks, as it requires quickly finding informative task-relevant experiences in both meta-training and adaptation. To address this challenge, we explicitly model an exploration policy learning problem for meta-RL, which is separated from exploitation policy learning, and introduce a novel empowerment-driven exploration objective, which aims to maximize information gain for task identification. We derive a corresponding intrinsic reward and develop a new off-policy meta-RL framework, which efficiently learns separate context-aware exploration and exploitation policies by sharing the knowledge of task inference. Experimental evaluation shows that our meta-RL method significantly outperforms state-of-the-art baselines on various sparse-reward MuJoCo locomotion tasks and more complex sparse-reward Meta-World tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-zhang21w, title = {MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration}, author = {Zhang, Jin and Wang, Jianhao and Hu, Hao and Chen, Tong and Chen, Yingfeng and Fan, Changjie and Zhang, Chongjie}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {12600--12610}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/zhang21w/zhang21w.pdf}, url = {https://proceedings.mlr.press/v139/zhang21w.html}, abstract = {Meta reinforcement learning (meta-RL) extracts knowledge from previous tasks and achieves fast adaptation to new tasks. Despite recent progress, efficient exploration in meta-RL remains a key challenge in sparse-reward tasks, as it requires quickly finding informative task-relevant experiences in both meta-training and adaptation. To address this challenge, we explicitly model an exploration policy learning problem for meta-RL, which is separated from exploitation policy learning, and introduce a novel empowerment-driven exploration objective, which aims to maximize information gain for task identification. We derive a corresponding intrinsic reward and develop a new off-policy meta-RL framework, which efficiently learns separate context-aware exploration and exploitation policies by sharing the knowledge of task inference. Experimental evaluation shows that our meta-RL method significantly outperforms state-of-the-art baselines on various sparse-reward MuJoCo locomotion tasks and more complex sparse-reward Meta-World tasks.} }
Endnote
%0 Conference Paper %T MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration %A Jin Zhang %A Jianhao Wang %A Hao Hu %A Tong Chen %A Yingfeng Chen %A Changjie Fan %A Chongjie Zhang %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-zhang21w %I PMLR %P 12600--12610 %U https://proceedings.mlr.press/v139/zhang21w.html %V 139 %X Meta reinforcement learning (meta-RL) extracts knowledge from previous tasks and achieves fast adaptation to new tasks. Despite recent progress, efficient exploration in meta-RL remains a key challenge in sparse-reward tasks, as it requires quickly finding informative task-relevant experiences in both meta-training and adaptation. To address this challenge, we explicitly model an exploration policy learning problem for meta-RL, which is separated from exploitation policy learning, and introduce a novel empowerment-driven exploration objective, which aims to maximize information gain for task identification. We derive a corresponding intrinsic reward and develop a new off-policy meta-RL framework, which efficiently learns separate context-aware exploration and exploitation policies by sharing the knowledge of task inference. Experimental evaluation shows that our meta-RL method significantly outperforms state-of-the-art baselines on various sparse-reward MuJoCo locomotion tasks and more complex sparse-reward Meta-World tasks.
APA
Zhang, J., Wang, J., Hu, H., Chen, T., Chen, Y., Fan, C. & Zhang, C.. (2021). MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:12600-12610 Available from https://proceedings.mlr.press/v139/zhang21w.html.

Related Material