Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Luisa M Zintgraf, Leo Feng, Cong Lu, Maximilian Igl, Kristian Hartikainen, Katja Hofmann, Shimon Whiteson
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:12991-13001, 2021.

Abstract

To rapidly learn a new task, it is often essential for agents to explore efficiently - especially when performance matters from the first timestep. One way to learn such behaviour is via meta-learning. Many existing methods however rely on dense rewards for meta-training, and can fail catastrophically if the rewards are sparse. Without a suitable reward signal, the need for exploration during meta-training is exacerbated. To address this, we propose HyperX, which uses novel reward bonuses for meta-training to explore in approximate hyper-state space (where hyper-states represent the environment state and the agent’s task belief). We show empirically that HyperX meta-learns better task-exploration and adapts more successfully to new tasks than existing methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-zintgraf21a, title = {Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning}, author = {Zintgraf, Luisa M and Feng, Leo and Lu, Cong and Igl, Maximilian and Hartikainen, Kristian and Hofmann, Katja and Whiteson, Shimon}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {12991--13001}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/zintgraf21a/zintgraf21a.pdf}, url = {https://proceedings.mlr.press/v139/zintgraf21a.html}, abstract = {To rapidly learn a new task, it is often essential for agents to explore efficiently - especially when performance matters from the first timestep. One way to learn such behaviour is via meta-learning. Many existing methods however rely on dense rewards for meta-training, and can fail catastrophically if the rewards are sparse. Without a suitable reward signal, the need for exploration during meta-training is exacerbated. To address this, we propose HyperX, which uses novel reward bonuses for meta-training to explore in approximate hyper-state space (where hyper-states represent the environment state and the agent’s task belief). We show empirically that HyperX meta-learns better task-exploration and adapts more successfully to new tasks than existing methods.} }
Endnote
%0 Conference Paper %T Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning %A Luisa M Zintgraf %A Leo Feng %A Cong Lu %A Maximilian Igl %A Kristian Hartikainen %A Katja Hofmann %A Shimon Whiteson %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-zintgraf21a %I PMLR %P 12991--13001 %U https://proceedings.mlr.press/v139/zintgraf21a.html %V 139 %X To rapidly learn a new task, it is often essential for agents to explore efficiently - especially when performance matters from the first timestep. One way to learn such behaviour is via meta-learning. Many existing methods however rely on dense rewards for meta-training, and can fail catastrophically if the rewards are sparse. Without a suitable reward signal, the need for exploration during meta-training is exacerbated. To address this, we propose HyperX, which uses novel reward bonuses for meta-training to explore in approximate hyper-state space (where hyper-states represent the environment state and the agent’s task belief). We show empirically that HyperX meta-learns better task-exploration and adapts more successfully to new tasks than existing methods.
APA
Zintgraf, L.M., Feng, L., Lu, C., Igl, M., Hartikainen, K., Hofmann, K. & Whiteson, S.. (2021). Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:12991-13001 Available from https://proceedings.mlr.press/v139/zintgraf21a.html.

Related Material