Regularized Policy Gradients: Direct Variance Reduction in Policy Gradient Estimation

Tingting Zhao, Gang Niu, Ning Xie, Jucheng Yang, Masashi Sugiyama
Asian Conference on Machine Learning, PMLR 45:333-348, 2016.

Abstract

Policy gradient algorithms are widely used in reinforcement learning problems with continuous action spaces, which update the policy parameters along the steepest direction of the expected return. However, large variance of policy gradient estimation often causes instability of policy update. In this paper, we propose to suppress the variance of gradient estimation by directly employing the variance of policy gradients as a regularizer. Through experiments, we demonstrate that the proposed variance-regularization technique combined with parameter-based exploration and baseline subtraction provides more reliable policy updates than non-regularized counterparts.

Cite this Paper


BibTeX
@InProceedings{pmlr-v45-Zhao15b, title = {Regularized Policy Gradients: Direct Variance Reduction in Policy Gradient Estimation}, author = {Zhao, Tingting and Niu, Gang and Xie, Ning and Yang, Jucheng and Sugiyama, Masashi}, booktitle = {Asian Conference on Machine Learning}, pages = {333--348}, year = {2016}, editor = {Holmes, Geoffrey and Liu, Tie-Yan}, volume = {45}, series = {Proceedings of Machine Learning Research}, address = {Hong Kong}, month = {20--22 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v45/Zhao15b.pdf}, url = {https://proceedings.mlr.press/v45/Zhao15b.html}, abstract = {Policy gradient algorithms are widely used in reinforcement learning problems with continuous action spaces, which update the policy parameters along the steepest direction of the expected return. However, large variance of policy gradient estimation often causes instability of policy update. In this paper, we propose to suppress the variance of gradient estimation by directly employing the variance of policy gradients as a regularizer. Through experiments, we demonstrate that the proposed variance-regularization technique combined with parameter-based exploration and baseline subtraction provides more reliable policy updates than non-regularized counterparts. } }
Endnote
%0 Conference Paper %T Regularized Policy Gradients: Direct Variance Reduction in Policy Gradient Estimation %A Tingting Zhao %A Gang Niu %A Ning Xie %A Jucheng Yang %A Masashi Sugiyama %B Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2016 %E Geoffrey Holmes %E Tie-Yan Liu %F pmlr-v45-Zhao15b %I PMLR %P 333--348 %U https://proceedings.mlr.press/v45/Zhao15b.html %V 45 %X Policy gradient algorithms are widely used in reinforcement learning problems with continuous action spaces, which update the policy parameters along the steepest direction of the expected return. However, large variance of policy gradient estimation often causes instability of policy update. In this paper, we propose to suppress the variance of gradient estimation by directly employing the variance of policy gradients as a regularizer. Through experiments, we demonstrate that the proposed variance-regularization technique combined with parameter-based exploration and baseline subtraction provides more reliable policy updates than non-regularized counterparts.
RIS
TY - CPAPER TI - Regularized Policy Gradients: Direct Variance Reduction in Policy Gradient Estimation AU - Tingting Zhao AU - Gang Niu AU - Ning Xie AU - Jucheng Yang AU - Masashi Sugiyama BT - Asian Conference on Machine Learning DA - 2016/02/25 ED - Geoffrey Holmes ED - Tie-Yan Liu ID - pmlr-v45-Zhao15b PB - PMLR DP - Proceedings of Machine Learning Research VL - 45 SP - 333 EP - 348 L1 - http://proceedings.mlr.press/v45/Zhao15b.pdf UR - https://proceedings.mlr.press/v45/Zhao15b.html AB - Policy gradient algorithms are widely used in reinforcement learning problems with continuous action spaces, which update the policy parameters along the steepest direction of the expected return. However, large variance of policy gradient estimation often causes instability of policy update. In this paper, we propose to suppress the variance of gradient estimation by directly employing the variance of policy gradients as a regularizer. Through experiments, we demonstrate that the proposed variance-regularization technique combined with parameter-based exploration and baseline subtraction provides more reliable policy updates than non-regularized counterparts. ER -
APA
Zhao, T., Niu, G., Xie, N., Yang, J. & Sugiyama, M.. (2016). Regularized Policy Gradients: Direct Variance Reduction in Policy Gradient Estimation. Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 45:333-348 Available from https://proceedings.mlr.press/v45/Zhao15b.html.

Related Material