Stabilizing Unsupervised Environment Design with a Learned Adversary

Ishita Mediratta, Minqi Jiang, Jack Parker-Holder, Michael Dennis, Eugene Vinitsky, Tim Rocktäschel
Proceedings of The 2nd Conference on Lifelong Learning Agents, PMLR 232:270-291, 2023.

Abstract

A key challenge in training generally-capable agents is the design of training tasks that facilitate broad generalization and robustness to environment variations. This challenge motivates the problem setting of \emph{Unsupervised Environment Design} (UED), whereby a student agent trains on an adaptive distribution of tasks proposed by a teacher agent. A pioneering approach for UED is PAIRED, which uses reinforcement learning (RL) to train a teacher policy to design tasks from scratch, making it possible to directly generate tasks that are adapted to the agent’s current capabilities. Despite its strong theoretical backing, PAIRED suffers from a variety of challenges that hinder its practical performance. Thus, state-of-the-art methods currently rely on \emph{curation} and \emph{mutation} rather than \emph{generation} of new tasks. In this work, we investigate several key shortcomings of PAIRED and propose solutions for each shortcoming. As a result, we make it possible for PAIRED to match or exceed state-of-the-art methods, producing robust agents in several challenging procedurally-generated environments, including a partially-observed maze navigation task and a continuous-control car racing environment. We believe this work motivates a renewed emphasis on UED methods based on learned models that directly generate challenging environments, potentially unlocking more open-ended RL training and, as a result, more general agents.

Cite this Paper


BibTeX
@InProceedings{pmlr-v232-mediratta23a, title = {Stabilizing Unsupervised Environment Design with a Learned Adversary}, author = {Mediratta, Ishita and Jiang, Minqi and Parker-Holder, Jack and Dennis, Michael and Vinitsky, Eugene and Rockt\"aschel, Tim}, booktitle = {Proceedings of The 2nd Conference on Lifelong Learning Agents}, pages = {270--291}, year = {2023}, editor = {Chandar, Sarath and Pascanu, Razvan and Sedghi, Hanie and Precup, Doina}, volume = {232}, series = {Proceedings of Machine Learning Research}, month = {22--25 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v232/mediratta23a/mediratta23a.pdf}, url = {https://proceedings.mlr.press/v232/mediratta23a.html}, abstract = {A key challenge in training generally-capable agents is the design of training tasks that facilitate broad generalization and robustness to environment variations. This challenge motivates the problem setting of \emph{Unsupervised Environment Design} (UED), whereby a student agent trains on an adaptive distribution of tasks proposed by a teacher agent. A pioneering approach for UED is PAIRED, which uses reinforcement learning (RL) to train a teacher policy to design tasks from scratch, making it possible to directly generate tasks that are adapted to the agent’s current capabilities. Despite its strong theoretical backing, PAIRED suffers from a variety of challenges that hinder its practical performance. Thus, state-of-the-art methods currently rely on \emph{curation} and \emph{mutation} rather than \emph{generation} of new tasks. In this work, we investigate several key shortcomings of PAIRED and propose solutions for each shortcoming. As a result, we make it possible for PAIRED to match or exceed state-of-the-art methods, producing robust agents in several challenging procedurally-generated environments, including a partially-observed maze navigation task and a continuous-control car racing environment. We believe this work motivates a renewed emphasis on UED methods based on learned models that directly generate challenging environments, potentially unlocking more open-ended RL training and, as a result, more general agents.} }
Endnote
%0 Conference Paper %T Stabilizing Unsupervised Environment Design with a Learned Adversary %A Ishita Mediratta %A Minqi Jiang %A Jack Parker-Holder %A Michael Dennis %A Eugene Vinitsky %A Tim Rocktäschel %B Proceedings of The 2nd Conference on Lifelong Learning Agents %C Proceedings of Machine Learning Research %D 2023 %E Sarath Chandar %E Razvan Pascanu %E Hanie Sedghi %E Doina Precup %F pmlr-v232-mediratta23a %I PMLR %P 270--291 %U https://proceedings.mlr.press/v232/mediratta23a.html %V 232 %X A key challenge in training generally-capable agents is the design of training tasks that facilitate broad generalization and robustness to environment variations. This challenge motivates the problem setting of \emph{Unsupervised Environment Design} (UED), whereby a student agent trains on an adaptive distribution of tasks proposed by a teacher agent. A pioneering approach for UED is PAIRED, which uses reinforcement learning (RL) to train a teacher policy to design tasks from scratch, making it possible to directly generate tasks that are adapted to the agent’s current capabilities. Despite its strong theoretical backing, PAIRED suffers from a variety of challenges that hinder its practical performance. Thus, state-of-the-art methods currently rely on \emph{curation} and \emph{mutation} rather than \emph{generation} of new tasks. In this work, we investigate several key shortcomings of PAIRED and propose solutions for each shortcoming. As a result, we make it possible for PAIRED to match or exceed state-of-the-art methods, producing robust agents in several challenging procedurally-generated environments, including a partially-observed maze navigation task and a continuous-control car racing environment. We believe this work motivates a renewed emphasis on UED methods based on learned models that directly generate challenging environments, potentially unlocking more open-ended RL training and, as a result, more general agents.
APA
Mediratta, I., Jiang, M., Parker-Holder, J., Dennis, M., Vinitsky, E. & Rocktäschel, T.. (2023). Stabilizing Unsupervised Environment Design with a Learned Adversary. Proceedings of The 2nd Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 232:270-291 Available from https://proceedings.mlr.press/v232/mediratta23a.html.

Related Material