Deterministic Policy Gradient Algorithms

David Silver; Guy Lever; Nicolas Heess; Thomas Degris; Daan Wierstra; Martin Riedmiller

Deterministic Policy Gradient Algorithms

David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, Martin Riedmiller

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(1):387-395, 2014.

Abstract

In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic policy gradient has a particularly appealing form: it is the expected gradient of the action-value function. This simple form means that the deterministic policy gradient can be estimated much more efficiently than the usual stochastic policy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. Deterministic policy gradient algorithms outperformed their stochastic counterparts in several benchmark problems, particularly in high-dimensional action spaces.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-silver14,
  title = 	 {Deterministic Policy Gradient Algorithms},
  author = 	 {Silver, David and Lever, Guy and Heess, Nicolas and Degris, Thomas and Wierstra, Daan and Riedmiller, Martin},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {387--395},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {1},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/silver14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/silver14.html},
  abstract = 	 {In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic policy gradient has a particularly appealing form: it is the expected gradient of the action-value function. This simple form means that the deterministic policy gradient can be estimated much more efficiently than the usual stochastic policy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. Deterministic policy gradient algorithms outperformed their stochastic counterparts in several benchmark problems, particularly in high-dimensional action spaces.}
}

Endnote

%0 Conference Paper
%T Deterministic Policy Gradient Algorithms
%A David Silver
%A Guy Lever
%A Nicolas Heess
%A Thomas Degris
%A Daan Wierstra
%A Martin Riedmiller
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-silver14
%I PMLR
%P 387--395
%U https://proceedings.mlr.press/v32/silver14.html
%V 32
%N 1
%X In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic policy gradient has a particularly appealing form: it is the expected gradient of the action-value function. This simple form means that the deterministic policy gradient can be estimated much more efficiently than the usual stochastic policy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. Deterministic policy gradient algorithms outperformed their stochastic counterparts in several benchmark problems, particularly in high-dimensional action spaces.

RIS


TY  - CPAPER
TI  - Deterministic Policy Gradient Algorithms
AU  - David Silver
AU  - Guy Lever
AU  - Nicolas Heess
AU  - Thomas Degris
AU  - Daan Wierstra
AU  - Martin Riedmiller
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/01/27
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-silver14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 1
SP  - 387
EP  - 395
L1  - http://proceedings.mlr.press/v32/silver14.pdf
UR  - https://proceedings.mlr.press/v32/silver14.html
AB  - In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic policy gradient has a particularly appealing form: it is the expected gradient of the action-value function. This simple form means that the deterministic policy gradient can be estimated much more efficiently than the usual stochastic policy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. Deterministic policy gradient algorithms outperformed their stochastic counterparts in several benchmark problems, particularly in high-dimensional action spaces.
ER  -

APA


Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D. & Riedmiller, M.. (2014). Deterministic Policy Gradient Algorithms. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(1):387-395 Available from https://proceedings.mlr.press/v32/silver14.html.

Deterministic Policy Gradient Algorithms

Abstract

Cite this Paper

Related Material