Learning Skills Diverse in Value-Relevant Features

Matthew J. A. Smith, Jelena Luketina, Kristian Hartikainen, Maximilian Igl, Shimon Whiteson
Proceedings of The 1st Conference on Lifelong Learning Agents, PMLR 199:1174-1194, 2022.

Abstract

Behavioural abstraction via temporally extended actions is vital to solving large-scale reinforcement learning problems. Skills structure exploration, speed up credit assignment, and can be used in transfer learning. However, such abstraction is often difficult or expensive for experts to craft by hand. Unsupervised information-theoretic methods (Gregor et al., 2016; Eysenbach et al., 2019; Sharma et al., 2020) address this problem by learning a set of skills without using environment rewards, typically by maximizing discriminability of the states visited by individual skills. However, since only some features of the state matter in complex environments, these methods often discover behaviours that are trivially diverse, learning skills that are not helpful for downstream tasks. To overcome this limitation, we propose a method for learning skills that only control features important to the tasks of interest. First, by training on a small set of source tasks, the agent learns which features are most relevant. Then, the discriminability objective for an unsupervised information-theoretic method is defined for this learned feature space. This allows the construction of sets of diverse and useful skills that can control the most important features. Experimental results in continuous control domains validate our method, demonstrating that it yields skills that substantially improve learning on downstream locomotion tasks with sparse rewards.

Cite this Paper


BibTeX
@InProceedings{pmlr-v199-smith22a, title = {Learning Skills Diverse in Value-Relevant Features}, author = {Smith, Matthew J. A. and Luketina, Jelena and Hartikainen, Kristian and Igl, Maximilian and Whiteson, Shimon}, booktitle = {Proceedings of The 1st Conference on Lifelong Learning Agents}, pages = {1174--1194}, year = {2022}, editor = {Chandar, Sarath and Pascanu, Razvan and Precup, Doina}, volume = {199}, series = {Proceedings of Machine Learning Research}, month = {22--24 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v199/smith22a/smith22a.pdf}, url = {https://proceedings.mlr.press/v199/smith22a.html}, abstract = {Behavioural abstraction via temporally extended actions is vital to solving large-scale reinforcement learning problems. Skills structure exploration, speed up credit assignment, and can be used in transfer learning. However, such abstraction is often difficult or expensive for experts to craft by hand. Unsupervised information-theoretic methods (Gregor et al., 2016; Eysenbach et al., 2019; Sharma et al., 2020) address this problem by learning a set of skills without using environment rewards, typically by maximizing discriminability of the states visited by individual skills. However, since only some features of the state matter in complex environments, these methods often discover behaviours that are trivially diverse, learning skills that are not helpful for downstream tasks. To overcome this limitation, we propose a method for learning skills that only control features important to the tasks of interest. First, by training on a small set of source tasks, the agent learns which features are most relevant. Then, the discriminability objective for an unsupervised information-theoretic method is defined for this learned feature space. This allows the construction of sets of diverse and useful skills that can control the most important features. Experimental results in continuous control domains validate our method, demonstrating that it yields skills that substantially improve learning on downstream locomotion tasks with sparse rewards.} }
Endnote
%0 Conference Paper %T Learning Skills Diverse in Value-Relevant Features %A Matthew J. A. Smith %A Jelena Luketina %A Kristian Hartikainen %A Maximilian Igl %A Shimon Whiteson %B Proceedings of The 1st Conference on Lifelong Learning Agents %C Proceedings of Machine Learning Research %D 2022 %E Sarath Chandar %E Razvan Pascanu %E Doina Precup %F pmlr-v199-smith22a %I PMLR %P 1174--1194 %U https://proceedings.mlr.press/v199/smith22a.html %V 199 %X Behavioural abstraction via temporally extended actions is vital to solving large-scale reinforcement learning problems. Skills structure exploration, speed up credit assignment, and can be used in transfer learning. However, such abstraction is often difficult or expensive for experts to craft by hand. Unsupervised information-theoretic methods (Gregor et al., 2016; Eysenbach et al., 2019; Sharma et al., 2020) address this problem by learning a set of skills without using environment rewards, typically by maximizing discriminability of the states visited by individual skills. However, since only some features of the state matter in complex environments, these methods often discover behaviours that are trivially diverse, learning skills that are not helpful for downstream tasks. To overcome this limitation, we propose a method for learning skills that only control features important to the tasks of interest. First, by training on a small set of source tasks, the agent learns which features are most relevant. Then, the discriminability objective for an unsupervised information-theoretic method is defined for this learned feature space. This allows the construction of sets of diverse and useful skills that can control the most important features. Experimental results in continuous control domains validate our method, demonstrating that it yields skills that substantially improve learning on downstream locomotion tasks with sparse rewards.
APA
Smith, M.J.A., Luketina, J., Hartikainen, K., Igl, M. & Whiteson, S.. (2022). Learning Skills Diverse in Value-Relevant Features. Proceedings of The 1st Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 199:1174-1194 Available from https://proceedings.mlr.press/v199/smith22a.html.

Related Material