Demonstration-Conditioned Reinforcement Learning for Few-Shot Imitation
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:2376-2387, 2021.
In few-shot imitation, an agent is given a few demonstrations of a previously unseen task, and must then successfully perform that task. We propose a novel approach to learning few-shot-imitation agents that we call demonstration-conditioned reinforcement learning (DCRL). Given a training set consisting of demonstrations, reward functions and transition distributions for multiple tasks, the idea is to work with a policy that takes demonstrations as input, and to train this policy to maximize the average of the cumulative reward over the set of training tasks. Relative to previously proposed few-shot imitation methods that use behaviour cloning or infer reward functions from demonstrations, our method has the disadvantage that it requires reward functions at training time. However, DCRL also has several advantages, such as the ability to improve upon suboptimal demonstrations, to operate given state-only demonstrations, and to cope with a domain shift between the demonstrator and the agent. Moreover, we show that DCRL outperforms methods based on behaviour cloning by a large margin, on navigation tasks and on robotic manipulation tasks from the Meta-World benchmark.