Simulation-Acquired Latent Action Spaces for Dynamics Generalization

Nicholas Corrado, Yuxiao Qu, Josiah P. Hanna
Proceedings of The 1st Conference on Lifelong Learning Agents, PMLR 199:661-682, 2022.

Abstract

Deep reinforcement learning has shown incredible promise at training high-performing agents to solve high-dimensional continuous control tasks in a particular training environment. However, to be useful in real-world settings, long-lived agents must perform well across a range of environmental conditions. Naively applying deep RL to a task where environment conditions may vary from episode to episode can be data inefficient. To address this inefficiency, we introduce a method that discovers structure in an agent’s high-dimensional continuous action space to speed up learning across a range of environmental conditions. Whereas prior work on finding so-called latent action spaces requires expert demonstrations or on-task experience, we instead propose to discover the latent, lower-dimensional action space in a simulated source environment and then transfer the learned action space for training in the target environment. We evaluate our novel method on randomized variants of simulated MuJoCo environments and find that, when there is a lower-dimensional action-space to exploit, our method significantly increases data efficiency. For instance, in the Ant environment, our method reduces the 8-dimensional action-space to a 3-dimensional action-space and doubles the average return achieved after a training budget of 2 million timesteps.

Cite this Paper


BibTeX
@InProceedings{pmlr-v199-corrado22a, title = {Simulation-Acquired Latent Action Spaces for Dynamics Generalization}, author = {Corrado, Nicholas and Qu, Yuxiao and Hanna, Josiah P.}, booktitle = {Proceedings of The 1st Conference on Lifelong Learning Agents}, pages = {661--682}, year = {2022}, editor = {Chandar, Sarath and Pascanu, Razvan and Precup, Doina}, volume = {199}, series = {Proceedings of Machine Learning Research}, month = {22--24 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v199/corrado22a/corrado22a.pdf}, url = {https://proceedings.mlr.press/v199/corrado22a.html}, abstract = {Deep reinforcement learning has shown incredible promise at training high-performing agents to solve high-dimensional continuous control tasks in a particular training environment. However, to be useful in real-world settings, long-lived agents must perform well across a range of environmental conditions. Naively applying deep RL to a task where environment conditions may vary from episode to episode can be data inefficient. To address this inefficiency, we introduce a method that discovers structure in an agent’s high-dimensional continuous action space to speed up learning across a range of environmental conditions. Whereas prior work on finding so-called latent action spaces requires expert demonstrations or on-task experience, we instead propose to discover the latent, lower-dimensional action space in a simulated source environment and then transfer the learned action space for training in the target environment. We evaluate our novel method on randomized variants of simulated MuJoCo environments and find that, when there is a lower-dimensional action-space to exploit, our method significantly increases data efficiency. For instance, in the Ant environment, our method reduces the 8-dimensional action-space to a 3-dimensional action-space and doubles the average return achieved after a training budget of 2 million timesteps. } }
Endnote
%0 Conference Paper %T Simulation-Acquired Latent Action Spaces for Dynamics Generalization %A Nicholas Corrado %A Yuxiao Qu %A Josiah P. Hanna %B Proceedings of The 1st Conference on Lifelong Learning Agents %C Proceedings of Machine Learning Research %D 2022 %E Sarath Chandar %E Razvan Pascanu %E Doina Precup %F pmlr-v199-corrado22a %I PMLR %P 661--682 %U https://proceedings.mlr.press/v199/corrado22a.html %V 199 %X Deep reinforcement learning has shown incredible promise at training high-performing agents to solve high-dimensional continuous control tasks in a particular training environment. However, to be useful in real-world settings, long-lived agents must perform well across a range of environmental conditions. Naively applying deep RL to a task where environment conditions may vary from episode to episode can be data inefficient. To address this inefficiency, we introduce a method that discovers structure in an agent’s high-dimensional continuous action space to speed up learning across a range of environmental conditions. Whereas prior work on finding so-called latent action spaces requires expert demonstrations or on-task experience, we instead propose to discover the latent, lower-dimensional action space in a simulated source environment and then transfer the learned action space for training in the target environment. We evaluate our novel method on randomized variants of simulated MuJoCo environments and find that, when there is a lower-dimensional action-space to exploit, our method significantly increases data efficiency. For instance, in the Ant environment, our method reduces the 8-dimensional action-space to a 3-dimensional action-space and doubles the average return achieved after a training budget of 2 million timesteps.
APA
Corrado, N., Qu, Y. & Hanna, J.P.. (2022). Simulation-Acquired Latent Action Spaces for Dynamics Generalization. Proceedings of The 1st Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 199:661-682 Available from https://proceedings.mlr.press/v199/corrado22a.html.

Related Material