Regularized Policy Gradients: Direct Variance Reduction in Policy Gradient Estimation
Asian Conference on Machine Learning, PMLR 45:333-348, 2016.
Policy gradient algorithms are widely used in reinforcement learning problems with continuous action spaces, which update the policy parameters along the steepest direction of the expected return. However, large variance of policy gradient estimation often causes instability of policy update. In this paper, we propose to suppress the variance of gradient estimation by directly employing the variance of policy gradients as a regularizer. Through experiments, we demonstrate that the proposed variance-regularization technique combined with parameter-based exploration and baseline subtraction provides more reliable policy updates than non-regularized counterparts.