Regularized Policy Gradients: Direct Variance Reduction in Policy Gradient Estimation

Tingting Zhao; Gang Niu; Ning Xie; Jucheng Yang; Masashi Sugiyama

Regularized Policy Gradients: Direct Variance Reduction in Policy Gradient Estimation

Tingting Zhao, Gang Niu, Ning Xie, Jucheng Yang, Masashi Sugiyama

Asian Conference on Machine Learning, PMLR 45:333-348, 2016.

Abstract

Policy gradient algorithms are widely used in reinforcement learning problems with continuous action spaces, which update the policy parameters along the steepest direction of the expected return. However, large variance of policy gradient estimation often causes instability of policy update. In this paper, we propose to suppress the variance of gradient estimation by directly employing the variance of policy gradients as a regularizer. Through experiments, we demonstrate that the proposed variance-regularization technique combined with parameter-based exploration and baseline subtraction provides more reliable policy updates than non-regularized counterparts.

Cite this Paper

BibTeX


@InProceedings{pmlr-v45-Zhao15b,
  title = 	 {Regularized Policy Gradients: Direct Variance Reduction in Policy Gradient Estimation},
  author = 	 {Zhao, Tingting and Niu, Gang and Xie, Ning and Yang, Jucheng and Sugiyama, Masashi},
  booktitle = 	 {Asian Conference on Machine Learning},
  pages = 	 {333--348},
  year = 	 {2016},
  editor = 	 {Holmes, Geoffrey and Liu, Tie-Yan},
  volume = 	 {45},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Hong Kong},
  month = 	 {20--22 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v45/Zhao15b.pdf},
  url = 	 {https://proceedings.mlr.press/v45/Zhao15b.html},
  abstract = 	 {Policy gradient algorithms are widely used in reinforcement learning problems with continuous action spaces, which update the policy parameters along the steepest direction of the expected return. However, large variance of policy gradient estimation often causes instability of policy update. In this paper, we propose to suppress the variance of gradient estimation by directly employing the variance of policy gradients as a regularizer. Through experiments, we demonstrate that the proposed variance-regularization technique combined with parameter-based exploration and baseline subtraction provides more reliable policy updates than non-regularized counterparts. }
}

Endnote

%0 Conference Paper
%T Regularized Policy Gradients: Direct Variance Reduction in Policy Gradient Estimation
%A Tingting Zhao
%A Gang Niu
%A Ning Xie
%A Jucheng Yang
%A Masashi Sugiyama
%B Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Geoffrey Holmes
%E Tie-Yan Liu	
%F pmlr-v45-Zhao15b
%I PMLR
%P 333--348
%U https://proceedings.mlr.press/v45/Zhao15b.html
%V 45
%X Policy gradient algorithms are widely used in reinforcement learning problems with continuous action spaces, which update the policy parameters along the steepest direction of the expected return. However, large variance of policy gradient estimation often causes instability of policy update. In this paper, we propose to suppress the variance of gradient estimation by directly employing the variance of policy gradients as a regularizer. Through experiments, we demonstrate that the proposed variance-regularization technique combined with parameter-based exploration and baseline subtraction provides more reliable policy updates than non-regularized counterparts.

RIS


TY  - CPAPER
TI  - Regularized Policy Gradients: Direct Variance Reduction in Policy Gradient Estimation
AU  - Tingting Zhao
AU  - Gang Niu
AU  - Ning Xie
AU  - Jucheng Yang
AU  - Masashi Sugiyama
BT  - Asian Conference on Machine Learning
DA  - 2016/02/25
ED  - Geoffrey Holmes
ED  - Tie-Yan Liu	
ID  - pmlr-v45-Zhao15b
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 45
SP  - 333
EP  - 348
L1  - http://proceedings.mlr.press/v45/Zhao15b.pdf
UR  - https://proceedings.mlr.press/v45/Zhao15b.html
AB  - Policy gradient algorithms are widely used in reinforcement learning problems with continuous action spaces, which update the policy parameters along the steepest direction of the expected return. However, large variance of policy gradient estimation often causes instability of policy update. In this paper, we propose to suppress the variance of gradient estimation by directly employing the variance of policy gradients as a regularizer. Through experiments, we demonstrate that the proposed variance-regularization technique combined with parameter-based exploration and baseline subtraction provides more reliable policy updates than non-regularized counterparts. 
ER  -

APA


Zhao, T., Niu, G., Xie, N., Yang, J. & Sugiyama, M.. (2016). Regularized Policy Gradients: Direct Variance Reduction in Policy Gradient Estimation. Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 45:333-348 Available from https://proceedings.mlr.press/v45/Zhao15b.html.

Related Material

Download PDF