Universal Value Function Approximators

Tom Schaul; Daniel Horgan; Karol Gregor; David Silver

Universal Value Function Approximators

Tom Schaul, Daniel Horgan, Karol Gregor, David Silver

Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:1312-1320, 2015.

Abstract

Value functions are a core component of reinforcement learning. The main idea is to to construct a single function approximator V(s; theta) that estimates the long-term reward from any state s, using parameters θ. In this paper we introduce universal value function approximators (UVFAs) V(s,g;theta) that generalise not just over states s but also over goals g. We develop an efficient technique for supervised learning of UVFAs, by factoring observed values into separate embedding vectors for state and goal, and then learning a mapping from s and g to these factored embedding vectors. We show how this technique may be incorporated into a reinforcement learning algorithm that updates the UVFA solely from observed rewards. Finally, we demonstrate that a UVFA can successfully generalise to previously unseen goals.

Cite this Paper

BibTeX


@InProceedings{pmlr-v37-schaul15,
  title = 	 {Universal Value Function Approximators},
  author = 	 {Schaul, Tom and Horgan, Daniel and Gregor, Karol and Silver, David},
  booktitle = 	 {Proceedings of the 32nd International Conference on Machine Learning},
  pages = 	 {1312--1320},
  year = 	 {2015},
  editor = 	 {Bach, Francis and Blei, David},
  volume = 	 {37},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Lille, France},
  month = 	 {07--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v37/schaul15.pdf},
  url = 	 {https://proceedings.mlr.press/v37/schaul15.html},
  abstract = 	 {Value functions are a core component of reinforcement learning. The main idea is to to construct a single function approximator V(s; theta) that estimates the long-term reward from any state s, using parameters θ. In this paper we introduce universal value function approximators (UVFAs) V(s,g;theta) that generalise not just over states s but also over goals g. We develop an efficient technique for supervised learning of UVFAs, by factoring observed values into separate embedding vectors for state and goal, and then learning a mapping from s and g to these factored embedding vectors. We show how this technique may be incorporated into a reinforcement learning algorithm that updates the UVFA solely from observed rewards. Finally, we demonstrate that a UVFA can successfully generalise to previously unseen goals.}
}

Endnote

%0 Conference Paper
%T Universal Value Function Approximators
%A Tom Schaul
%A Daniel Horgan
%A Karol Gregor
%A David Silver
%B Proceedings of the 32nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Francis Bach
%E David Blei	
%F pmlr-v37-schaul15
%I PMLR
%P 1312--1320
%U https://proceedings.mlr.press/v37/schaul15.html
%V 37
%X Value functions are a core component of reinforcement learning. The main idea is to to construct a single function approximator V(s; theta) that estimates the long-term reward from any state s, using parameters θ. In this paper we introduce universal value function approximators (UVFAs) V(s,g;theta) that generalise not just over states s but also over goals g. We develop an efficient technique for supervised learning of UVFAs, by factoring observed values into separate embedding vectors for state and goal, and then learning a mapping from s and g to these factored embedding vectors. We show how this technique may be incorporated into a reinforcement learning algorithm that updates the UVFA solely from observed rewards. Finally, we demonstrate that a UVFA can successfully generalise to previously unseen goals.

RIS


TY  - CPAPER
TI  - Universal Value Function Approximators
AU  - Tom Schaul
AU  - Daniel Horgan
AU  - Karol Gregor
AU  - David Silver
BT  - Proceedings of the 32nd International Conference on Machine Learning
DA  - 2015/06/01
ED  - Francis Bach
ED  - David Blei	
ID  - pmlr-v37-schaul15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 37
SP  - 1312
EP  - 1320
L1  - http://proceedings.mlr.press/v37/schaul15.pdf
UR  - https://proceedings.mlr.press/v37/schaul15.html
AB  - Value functions are a core component of reinforcement learning. The main idea is to to construct a single function approximator V(s; theta) that estimates the long-term reward from any state s, using parameters θ. In this paper we introduce universal value function approximators (UVFAs) V(s,g;theta) that generalise not just over states s but also over goals g. We develop an efficient technique for supervised learning of UVFAs, by factoring observed values into separate embedding vectors for state and goal, and then learning a mapping from s and g to these factored embedding vectors. We show how this technique may be incorporated into a reinforcement learning algorithm that updates the UVFA solely from observed rewards. Finally, we demonstrate that a UVFA can successfully generalise to previously unseen goals.
ER  -

APA


Schaul, T., Horgan, D., Gregor, K. & Silver, D.. (2015). Universal Value Function Approximators. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:1312-1320 Available from https://proceedings.mlr.press/v37/schaul15.html.

Universal Value Function Approximators

Abstract

Cite this Paper

Related Material