Actor-Critic Reinforcement Learning with Energy-Based Policies

Nicolas Heess; David Silver; Yee Whye Teh

Actor-Critic Reinforcement Learning with Energy-Based Policies

Nicolas Heess, David Silver, Yee Whye Teh

Proceedings of the Tenth European Workshop on Reinforcement Learning, PMLR 24:45-58, 2013.

Abstract

We consider reinforcement learning in Markov decision processes with high dimensional state and action spaces. We parametrize policies using energy-based models (particularly restricted Boltzmann machines), and train them using policy gradient learning. Our approach builds upon Sallans and Hinton (2004), who parameterized value functions using energy-based models, trained using a non-linear variant of temporal-difference (TD) learning. Unfortunately, non-linear TD is known to diverge in theory and practice. We introduce the first sound and efficient algorithm for training energy-based policies, based on an actor-critic architecture. Our algorithm is computationally efficient, converges close to a local optimum, and outperforms Sallans and Hinton (2004) in several high dimensional domains.

Cite this Paper

BibTeX

@InProceedings{pmlr-v24-heess12a,
  title = 	 {Actor-Critic Reinforcement Learning with Energy-Based Policies},
  author = 	 {Heess, Nicolas and Silver, David and Teh, Yee Whye},
  booktitle = 	 {Proceedings of the Tenth European Workshop on Reinforcement Learning},
  pages = 	 {45--58},
  year = 	 {2013},
  editor = 	 {Deisenroth, Marc Peter and Szepesvári, Csaba and Peters, Jan},
  volume = 	 {24},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Edinburgh, Scotland},
  month = 	 {30 Jun--01 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v24/heess12a/heess12a.pdf},
  url = 	 {https://proceedings.mlr.press/v24/heess12a.html},
  abstract = 	 {We consider reinforcement learning in Markov decision processes with high dimensional state and action spaces. We parametrize policies using energy-based models (particularly restricted Boltzmann machines), and train them using policy gradient learning. Our approach builds upon Sallans and Hinton (2004), who parameterized value functions using energy-based models, trained using a non-linear variant of temporal-difference (TD) learning. Unfortunately, non-linear TD is known to diverge in theory and practice. We introduce the first sound and efficient algorithm for training energy-based policies, based on an actor-critic architecture. Our algorithm is computationally efficient, converges close to a local optimum, and outperforms Sallans and Hinton (2004) in several high dimensional domains.}
}

Endnote

%0 Conference Paper
%T Actor-Critic Reinforcement Learning with Energy-Based Policies
%A Nicolas Heess
%A David Silver
%A Yee Whye Teh
%B Proceedings of the Tenth European Workshop on Reinforcement Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Marc Peter Deisenroth
%E Csaba Szepesvári
%E Jan Peters	
%F pmlr-v24-heess12a
%I PMLR
%P 45--58
%U https://proceedings.mlr.press/v24/heess12a.html
%V 24
%X We consider reinforcement learning in Markov decision processes with high dimensional state and action spaces. We parametrize policies using energy-based models (particularly restricted Boltzmann machines), and train them using policy gradient learning. Our approach builds upon Sallans and Hinton (2004), who parameterized value functions using energy-based models, trained using a non-linear variant of temporal-difference (TD) learning. Unfortunately, non-linear TD is known to diverge in theory and practice. We introduce the first sound and efficient algorithm for training energy-based policies, based on an actor-critic architecture. Our algorithm is computationally efficient, converges close to a local optimum, and outperforms Sallans and Hinton (2004) in several high dimensional domains.

RIS

TY  - CPAPER
TI  - Actor-Critic Reinforcement Learning with Energy-Based Policies
AU  - Nicolas Heess
AU  - David Silver
AU  - Yee Whye Teh
BT  - Proceedings of the Tenth European Workshop on Reinforcement Learning
DA  - 2013/01/12
ED  - Marc Peter Deisenroth
ED  - Csaba Szepesvári
ED  - Jan Peters	
ID  - pmlr-v24-heess12a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 24
SP  - 45
EP  - 58
L1  - http://proceedings.mlr.press/v24/heess12a/heess12a.pdf
UR  - https://proceedings.mlr.press/v24/heess12a.html
AB  - We consider reinforcement learning in Markov decision processes with high dimensional state and action spaces. We parametrize policies using energy-based models (particularly restricted Boltzmann machines), and train them using policy gradient learning. Our approach builds upon Sallans and Hinton (2004), who parameterized value functions using energy-based models, trained using a non-linear variant of temporal-difference (TD) learning. Unfortunately, non-linear TD is known to diverge in theory and practice. We introduce the first sound and efficient algorithm for training energy-based policies, based on an actor-critic architecture. Our algorithm is computationally efficient, converges close to a local optimum, and outperforms Sallans and Hinton (2004) in several high dimensional domains.
ER  -

APA

Heess, N., Silver, D. & Teh, Y.W.. (2013). Actor-Critic Reinforcement Learning with Energy-Based Policies. Proceedings of the Tenth European Workshop on Reinforcement Learning, in Proceedings of Machine Learning Research 24:45-58 Available from https://proceedings.mlr.press/v24/heess12a.html.

Related Material

Download PDF