Composing Value Functions in Reinforcement Learning

Benjamin Van Niekerk, Steven James, Adam Earle, Benjamin Rosman
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:6401-6409, 2019.

Abstract

An important property for lifelong-learning agents is the ability to combine existing skills to solve new unseen tasks. In general, however, it is unclear how to compose existing skills in a principled manner. Under the assumption of deterministic dynamics, we prove that optimal value function composition can be achieved in entropy-regularised reinforcement learning (RL), and extend this result to the standard RL setting. Composition is demonstrated in a high-dimensional video game, where an agent with an existing library of skills is immediately able to solve new tasks without the need for further learning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-van-niekerk19a, title = {Composing Value Functions in Reinforcement Learning}, author = {Van Niekerk, Benjamin and James, Steven and Earle, Adam and Rosman, Benjamin}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {6401--6409}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/van-niekerk19a/van-niekerk19a.pdf}, url = {https://proceedings.mlr.press/v97/van-niekerk19a.html}, abstract = {An important property for lifelong-learning agents is the ability to combine existing skills to solve new unseen tasks. In general, however, it is unclear how to compose existing skills in a principled manner. Under the assumption of deterministic dynamics, we prove that optimal value function composition can be achieved in entropy-regularised reinforcement learning (RL), and extend this result to the standard RL setting. Composition is demonstrated in a high-dimensional video game, where an agent with an existing library of skills is immediately able to solve new tasks without the need for further learning.} }
Endnote
%0 Conference Paper %T Composing Value Functions in Reinforcement Learning %A Benjamin Van Niekerk %A Steven James %A Adam Earle %A Benjamin Rosman %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-van-niekerk19a %I PMLR %P 6401--6409 %U https://proceedings.mlr.press/v97/van-niekerk19a.html %V 97 %X An important property for lifelong-learning agents is the ability to combine existing skills to solve new unseen tasks. In general, however, it is unclear how to compose existing skills in a principled manner. Under the assumption of deterministic dynamics, we prove that optimal value function composition can be achieved in entropy-regularised reinforcement learning (RL), and extend this result to the standard RL setting. Composition is demonstrated in a high-dimensional video game, where an agent with an existing library of skills is immediately able to solve new tasks without the need for further learning.
APA
Van Niekerk, B., James, S., Earle, A. & Rosman, B.. (2019). Composing Value Functions in Reinforcement Learning. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:6401-6409 Available from https://proceedings.mlr.press/v97/van-niekerk19a.html.

Related Material