[edit]
Learning Skills Diverse in Value-Relevant Features
Proceedings of The 1st Conference on Lifelong Learning Agents, PMLR 199:1174-1194, 2022.
Abstract
Behavioural abstraction via temporally extended actions is vital to solving large-scale reinforcement learning problems. Skills structure exploration, speed up credit assignment, and can be used in transfer learning. However, such abstraction is often difficult or expensive for experts to craft by hand. Unsupervised information-theoretic methods (Gregor et al., 2016; Eysenbach et al., 2019; Sharma et al., 2020) address this problem by learning a set of skills without using environment rewards, typically by maximizing discriminability of the states visited by individual skills. However, since only some features of the state matter in complex environments, these methods often discover behaviours that are trivially diverse, learning skills that are not helpful for downstream tasks. To overcome this limitation, we propose a method for learning skills that only control features important to the tasks of interest. First, by training on a small set of source tasks, the agent learns which features are most relevant. Then, the discriminability objective for an unsupervised information-theoretic method is defined for this learned feature space. This allows the construction of sets of diverse and useful skills that can control the most important features. Experimental results in continuous control domains validate our method, demonstrating that it yields skills that substantially improve learning on downstream locomotion tasks with sparse rewards.