Policy Caches with Successor Features

Mark Nemecek, Ronald Parr
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:8025-8033, 2021.

Abstract

Transfer in reinforcement learning is based on the idea that it is possible to use what is learned in one task to improve the learning process in another task. For transfer between tasks which share transition dynamics but differ in reward function, successor features have been shown to be a useful representation which allows for efficient computation of action-value functions for previously-learned policies in new tasks. These functions induce policies in the new tasks, so an agent may not need to learn a new policy for each new task it encounters, especially if it is allowed some amount of suboptimality in those tasks. We present new bounds for the performance of optimal policies in a new task, as well as an approach to use these bounds to decide, when presented with a new task, whether to use cached policies or learn a new policy.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-nemecek21a, title = {Policy Caches with Successor Features}, author = {Nemecek, Mark and Parr, Ronald}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {8025--8033}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/nemecek21a/nemecek21a.pdf}, url = {https://proceedings.mlr.press/v139/nemecek21a.html}, abstract = {Transfer in reinforcement learning is based on the idea that it is possible to use what is learned in one task to improve the learning process in another task. For transfer between tasks which share transition dynamics but differ in reward function, successor features have been shown to be a useful representation which allows for efficient computation of action-value functions for previously-learned policies in new tasks. These functions induce policies in the new tasks, so an agent may not need to learn a new policy for each new task it encounters, especially if it is allowed some amount of suboptimality in those tasks. We present new bounds for the performance of optimal policies in a new task, as well as an approach to use these bounds to decide, when presented with a new task, whether to use cached policies or learn a new policy.} }
Endnote
%0 Conference Paper %T Policy Caches with Successor Features %A Mark Nemecek %A Ronald Parr %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-nemecek21a %I PMLR %P 8025--8033 %U https://proceedings.mlr.press/v139/nemecek21a.html %V 139 %X Transfer in reinforcement learning is based on the idea that it is possible to use what is learned in one task to improve the learning process in another task. For transfer between tasks which share transition dynamics but differ in reward function, successor features have been shown to be a useful representation which allows for efficient computation of action-value functions for previously-learned policies in new tasks. These functions induce policies in the new tasks, so an agent may not need to learn a new policy for each new task it encounters, especially if it is allowed some amount of suboptimality in those tasks. We present new bounds for the performance of optimal policies in a new task, as well as an approach to use these bounds to decide, when presented with a new task, whether to use cached policies or learn a new policy.
APA
Nemecek, M. & Parr, R.. (2021). Policy Caches with Successor Features. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:8025-8033 Available from https://proceedings.mlr.press/v139/nemecek21a.html.

Related Material