Universal Value Function Approximators

Tom Schaul, Daniel Horgan, Karol Gregor, David Silver
; Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:1312-1320, 2015.

Abstract

Value functions are a core component of reinforcement learning. The main idea is to to construct a single function approximator V(s; theta) that estimates the long-term reward from any state s, using parameters θ. In this paper we introduce universal value function approximators (UVFAs) V(s,g;theta) that generalise not just over states s but also over goals g. We develop an efficient technique for supervised learning of UVFAs, by factoring observed values into separate embedding vectors for state and goal, and then learning a mapping from s and g to these factored embedding vectors. We show how this technique may be incorporated into a reinforcement learning algorithm that updates the UVFA solely from observed rewards. Finally, we demonstrate that a UVFA can successfully generalise to previously unseen goals.

Cite this Paper


BibTeX
@InProceedings{pmlr-v37-schaul15, title = {Universal Value Function Approximators}, author = {Tom Schaul and Daniel Horgan and Karol Gregor and David Silver}, booktitle = {Proceedings of the 32nd International Conference on Machine Learning}, pages = {1312--1320}, year = {2015}, editor = {Francis Bach and David Blei}, volume = {37}, series = {Proceedings of Machine Learning Research}, address = {Lille, France}, month = {07--09 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v37/schaul15.pdf}, url = {http://proceedings.mlr.press/v37/schaul15.html}, abstract = {Value functions are a core component of reinforcement learning. The main idea is to to construct a single function approximator V(s; theta) that estimates the long-term reward from any state s, using parameters θ. In this paper we introduce universal value function approximators (UVFAs) V(s,g;theta) that generalise not just over states s but also over goals g. We develop an efficient technique for supervised learning of UVFAs, by factoring observed values into separate embedding vectors for state and goal, and then learning a mapping from s and g to these factored embedding vectors. We show how this technique may be incorporated into a reinforcement learning algorithm that updates the UVFA solely from observed rewards. Finally, we demonstrate that a UVFA can successfully generalise to previously unseen goals.} }
Endnote
%0 Conference Paper %T Universal Value Function Approximators %A Tom Schaul %A Daniel Horgan %A Karol Gregor %A David Silver %B Proceedings of the 32nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2015 %E Francis Bach %E David Blei %F pmlr-v37-schaul15 %I PMLR %J Proceedings of Machine Learning Research %P 1312--1320 %U http://proceedings.mlr.press %V 37 %W PMLR %X Value functions are a core component of reinforcement learning. The main idea is to to construct a single function approximator V(s; theta) that estimates the long-term reward from any state s, using parameters θ. In this paper we introduce universal value function approximators (UVFAs) V(s,g;theta) that generalise not just over states s but also over goals g. We develop an efficient technique for supervised learning of UVFAs, by factoring observed values into separate embedding vectors for state and goal, and then learning a mapping from s and g to these factored embedding vectors. We show how this technique may be incorporated into a reinforcement learning algorithm that updates the UVFA solely from observed rewards. Finally, we demonstrate that a UVFA can successfully generalise to previously unseen goals.
RIS
TY - CPAPER TI - Universal Value Function Approximators AU - Tom Schaul AU - Daniel Horgan AU - Karol Gregor AU - David Silver BT - Proceedings of the 32nd International Conference on Machine Learning PY - 2015/06/01 DA - 2015/06/01 ED - Francis Bach ED - David Blei ID - pmlr-v37-schaul15 PB - PMLR SP - 1312 DP - PMLR EP - 1320 L1 - http://proceedings.mlr.press/v37/schaul15.pdf UR - http://proceedings.mlr.press/v37/schaul15.html AB - Value functions are a core component of reinforcement learning. The main idea is to to construct a single function approximator V(s; theta) that estimates the long-term reward from any state s, using parameters θ. In this paper we introduce universal value function approximators (UVFAs) V(s,g;theta) that generalise not just over states s but also over goals g. We develop an efficient technique for supervised learning of UVFAs, by factoring observed values into separate embedding vectors for state and goal, and then learning a mapping from s and g to these factored embedding vectors. We show how this technique may be incorporated into a reinforcement learning algorithm that updates the UVFA solely from observed rewards. Finally, we demonstrate that a UVFA can successfully generalise to previously unseen goals. ER -
APA
Schaul, T., Horgan, D., Gregor, K. & Silver, D.. (2015). Universal Value Function Approximators. Proceedings of the 32nd International Conference on Machine Learning, in PMLR 37:1312-1320

Related Material