Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices

Evan Z Liu, Aditi Raghunathan, Percy Liang, Chelsea Finn
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:6925-6935, 2021.

Abstract

The goal of meta-reinforcement learning (meta-RL) is to build agents that can quickly learn new tasks by leveraging prior experience on related tasks. Learning a new task often requires both exploring to gather task-relevant information and exploiting this information to solve the task. In principle, optimal exploration and exploitation can be learned end-to-end by simply maximizing task performance. However, such meta-RL approaches struggle with local optima due to a chicken-and-egg problem: learning to explore requires good exploitation to gauge the exploration’s utility, but learning to exploit requires information gathered via exploration. Optimizing separate objectives for exploration and exploitation can avoid this problem, but prior meta-RL exploration objectives yield suboptimal policies that gather information irrelevant to the task. We alleviate both concerns by constructing an exploitation objective that automatically identifies task-relevant information and an exploration objective to recover only this information. This avoids local optima in end-to-end training, without sacrificing optimal exploration. Empirically, DREAM substantially outperforms existing approaches on complex meta-RL problems, such as sparse-reward 3D visual navigation. Videos of DREAM: https://ezliu.github.io/dream/

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-liu21s, title = {Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices}, author = {Liu, Evan Z and Raghunathan, Aditi and Liang, Percy and Finn, Chelsea}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {6925--6935}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/liu21s/liu21s.pdf}, url = {https://proceedings.mlr.press/v139/liu21s.html}, abstract = {The goal of meta-reinforcement learning (meta-RL) is to build agents that can quickly learn new tasks by leveraging prior experience on related tasks. Learning a new task often requires both exploring to gather task-relevant information and exploiting this information to solve the task. In principle, optimal exploration and exploitation can be learned end-to-end by simply maximizing task performance. However, such meta-RL approaches struggle with local optima due to a chicken-and-egg problem: learning to explore requires good exploitation to gauge the exploration’s utility, but learning to exploit requires information gathered via exploration. Optimizing separate objectives for exploration and exploitation can avoid this problem, but prior meta-RL exploration objectives yield suboptimal policies that gather information irrelevant to the task. We alleviate both concerns by constructing an exploitation objective that automatically identifies task-relevant information and an exploration objective to recover only this information. This avoids local optima in end-to-end training, without sacrificing optimal exploration. Empirically, DREAM substantially outperforms existing approaches on complex meta-RL problems, such as sparse-reward 3D visual navigation. Videos of DREAM: https://ezliu.github.io/dream/} }
Endnote
%0 Conference Paper %T Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices %A Evan Z Liu %A Aditi Raghunathan %A Percy Liang %A Chelsea Finn %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-liu21s %I PMLR %P 6925--6935 %U https://proceedings.mlr.press/v139/liu21s.html %V 139 %X The goal of meta-reinforcement learning (meta-RL) is to build agents that can quickly learn new tasks by leveraging prior experience on related tasks. Learning a new task often requires both exploring to gather task-relevant information and exploiting this information to solve the task. In principle, optimal exploration and exploitation can be learned end-to-end by simply maximizing task performance. However, such meta-RL approaches struggle with local optima due to a chicken-and-egg problem: learning to explore requires good exploitation to gauge the exploration’s utility, but learning to exploit requires information gathered via exploration. Optimizing separate objectives for exploration and exploitation can avoid this problem, but prior meta-RL exploration objectives yield suboptimal policies that gather information irrelevant to the task. We alleviate both concerns by constructing an exploitation objective that automatically identifies task-relevant information and an exploration objective to recover only this information. This avoids local optima in end-to-end training, without sacrificing optimal exploration. Empirically, DREAM substantially outperforms existing approaches on complex meta-RL problems, such as sparse-reward 3D visual navigation. Videos of DREAM: https://ezliu.github.io/dream/
APA
Liu, E.Z., Raghunathan, A., Liang, P. & Finn, C.. (2021). Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:6925-6935 Available from https://proceedings.mlr.press/v139/liu21s.html.

Related Material