Demonstration-Conditioned Reinforcement Learning for Few-Shot Imitation

Christopher R. Dance, Julien Perez, Théo Cachet
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:2376-2387, 2021.

Abstract

In few-shot imitation, an agent is given a few demonstrations of a previously unseen task, and must then successfully perform that task. We propose a novel approach to learning few-shot-imitation agents that we call demonstration-conditioned reinforcement learning (DCRL). Given a training set consisting of demonstrations, reward functions and transition distributions for multiple tasks, the idea is to work with a policy that takes demonstrations as input, and to train this policy to maximize the average of the cumulative reward over the set of training tasks. Relative to previously proposed few-shot imitation methods that use behaviour cloning or infer reward functions from demonstrations, our method has the disadvantage that it requires reward functions at training time. However, DCRL also has several advantages, such as the ability to improve upon suboptimal demonstrations, to operate given state-only demonstrations, and to cope with a domain shift between the demonstrator and the agent. Moreover, we show that DCRL outperforms methods based on behaviour cloning by a large margin, on navigation tasks and on robotic manipulation tasks from the Meta-World benchmark.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-dance21a, title = {Demonstration-Conditioned Reinforcement Learning for Few-Shot Imitation}, author = {Dance, Christopher R. and Perez, Julien and Cachet, Th{\'e}o}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {2376--2387}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/dance21a/dance21a.pdf}, url = {https://proceedings.mlr.press/v139/dance21a.html}, abstract = {In few-shot imitation, an agent is given a few demonstrations of a previously unseen task, and must then successfully perform that task. We propose a novel approach to learning few-shot-imitation agents that we call demonstration-conditioned reinforcement learning (DCRL). Given a training set consisting of demonstrations, reward functions and transition distributions for multiple tasks, the idea is to work with a policy that takes demonstrations as input, and to train this policy to maximize the average of the cumulative reward over the set of training tasks. Relative to previously proposed few-shot imitation methods that use behaviour cloning or infer reward functions from demonstrations, our method has the disadvantage that it requires reward functions at training time. However, DCRL also has several advantages, such as the ability to improve upon suboptimal demonstrations, to operate given state-only demonstrations, and to cope with a domain shift between the demonstrator and the agent. Moreover, we show that DCRL outperforms methods based on behaviour cloning by a large margin, on navigation tasks and on robotic manipulation tasks from the Meta-World benchmark.} }
Endnote
%0 Conference Paper %T Demonstration-Conditioned Reinforcement Learning for Few-Shot Imitation %A Christopher R. Dance %A Julien Perez %A Théo Cachet %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-dance21a %I PMLR %P 2376--2387 %U https://proceedings.mlr.press/v139/dance21a.html %V 139 %X In few-shot imitation, an agent is given a few demonstrations of a previously unseen task, and must then successfully perform that task. We propose a novel approach to learning few-shot-imitation agents that we call demonstration-conditioned reinforcement learning (DCRL). Given a training set consisting of demonstrations, reward functions and transition distributions for multiple tasks, the idea is to work with a policy that takes demonstrations as input, and to train this policy to maximize the average of the cumulative reward over the set of training tasks. Relative to previously proposed few-shot imitation methods that use behaviour cloning or infer reward functions from demonstrations, our method has the disadvantage that it requires reward functions at training time. However, DCRL also has several advantages, such as the ability to improve upon suboptimal demonstrations, to operate given state-only demonstrations, and to cope with a domain shift between the demonstrator and the agent. Moreover, we show that DCRL outperforms methods based on behaviour cloning by a large margin, on navigation tasks and on robotic manipulation tasks from the Meta-World benchmark.
APA
Dance, C.R., Perez, J. & Cachet, T.. (2021). Demonstration-Conditioned Reinforcement Learning for Few-Shot Imitation. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:2376-2387 Available from https://proceedings.mlr.press/v139/dance21a.html.

Related Material