MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Jin Zhang; Jianhao Wang; Hao Hu; Tong Chen; Yingfeng Chen; Changjie Fan; Chongjie Zhang

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Jin Zhang, Jianhao Wang, Hao Hu, Tong Chen, Yingfeng Chen, Changjie Fan, Chongjie Zhang

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:12600-12610, 2021.

Abstract

Meta reinforcement learning (meta-RL) extracts knowledge from previous tasks and achieves fast adaptation to new tasks. Despite recent progress, efficient exploration in meta-RL remains a key challenge in sparse-reward tasks, as it requires quickly finding informative task-relevant experiences in both meta-training and adaptation. To address this challenge, we explicitly model an exploration policy learning problem for meta-RL, which is separated from exploitation policy learning, and introduce a novel empowerment-driven exploration objective, which aims to maximize information gain for task identification. We derive a corresponding intrinsic reward and develop a new off-policy meta-RL framework, which efficiently learns separate context-aware exploration and exploitation policies by sharing the knowledge of task inference. Experimental evaluation shows that our meta-RL method significantly outperforms state-of-the-art baselines on various sparse-reward MuJoCo locomotion tasks and more complex sparse-reward Meta-World tasks.

Cite this Paper

BibTeX


@InProceedings{pmlr-v139-zhang21w,
  title = 	 {MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration},
  author =       {Zhang, Jin and Wang, Jianhao and Hu, Hao and Chen, Tong and Chen, Yingfeng and Fan, Changjie and Zhang, Chongjie},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {12600--12610},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/zhang21w/zhang21w.pdf},
  url = 	 {https://proceedings.mlr.press/v139/zhang21w.html},
  abstract = 	 {Meta reinforcement learning (meta-RL) extracts knowledge from previous tasks and achieves fast adaptation to new tasks. Despite recent progress, efficient exploration in meta-RL remains a key challenge in sparse-reward tasks, as it requires quickly finding informative task-relevant experiences in both meta-training and adaptation. To address this challenge, we explicitly model an exploration policy learning problem for meta-RL, which is separated from exploitation policy learning, and introduce a novel empowerment-driven exploration objective, which aims to maximize information gain for task identification. We derive a corresponding intrinsic reward and develop a new off-policy meta-RL framework, which efficiently learns separate context-aware exploration and exploitation policies by sharing the knowledge of task inference. Experimental evaluation shows that our meta-RL method significantly outperforms state-of-the-art baselines on various sparse-reward MuJoCo locomotion tasks and more complex sparse-reward Meta-World tasks.}
}

Endnote

%0 Conference Paper
%T MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration
%A Jin Zhang
%A Jianhao Wang
%A Hao Hu
%A Tong Chen
%A Yingfeng Chen
%A Changjie Fan
%A Chongjie Zhang
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-zhang21w
%I PMLR
%P 12600--12610
%U https://proceedings.mlr.press/v139/zhang21w.html
%V 139
%X Meta reinforcement learning (meta-RL) extracts knowledge from previous tasks and achieves fast adaptation to new tasks. Despite recent progress, efficient exploration in meta-RL remains a key challenge in sparse-reward tasks, as it requires quickly finding informative task-relevant experiences in both meta-training and adaptation. To address this challenge, we explicitly model an exploration policy learning problem for meta-RL, which is separated from exploitation policy learning, and introduce a novel empowerment-driven exploration objective, which aims to maximize information gain for task identification. We derive a corresponding intrinsic reward and develop a new off-policy meta-RL framework, which efficiently learns separate context-aware exploration and exploitation policies by sharing the knowledge of task inference. Experimental evaluation shows that our meta-RL method significantly outperforms state-of-the-art baselines on various sparse-reward MuJoCo locomotion tasks and more complex sparse-reward Meta-World tasks.

APA


Zhang, J., Wang, J., Hu, H., Chen, T., Chen, Y., Fan, C. & Zhang, C.. (2021). MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:12600-12610 Available from https://proceedings.mlr.press/v139/zhang21w.html.

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Abstract

Cite this Paper

Related Material