Clipped Action Policy Gradient

Yasuhiro Fujita, Shin-ichi Maeda
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1597-1606, 2018.

Abstract

Many continuous control tasks have bounded action spaces. When policy gradient methods are applied to such tasks, out-of-bound actions need to be clipped before execution, while policies are usually optimized as if the actions are not clipped. We propose a policy gradient estimator that exploits the knowledge of actions being clipped to reduce the variance in estimation. We prove that our estimator, named clipped action policy gradient (CAPG), is unbiased and achieves lower variance than the conventional estimator that ignores action bounds. Experimental results demonstrate that CAPG generally outperforms the conventional estimator, indicating that it is a better policy gradient estimator for continuous control tasks. The source code is available at https://github.com/pfnet-research/capg.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-fujita18a, title = {Clipped Action Policy Gradient}, author = {Fujita, Yasuhiro and Maeda, Shin-ichi}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {1597--1606}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/fujita18a/fujita18a.pdf}, url = {https://proceedings.mlr.press/v80/fujita18a.html}, abstract = {Many continuous control tasks have bounded action spaces. When policy gradient methods are applied to such tasks, out-of-bound actions need to be clipped before execution, while policies are usually optimized as if the actions are not clipped. We propose a policy gradient estimator that exploits the knowledge of actions being clipped to reduce the variance in estimation. We prove that our estimator, named clipped action policy gradient (CAPG), is unbiased and achieves lower variance than the conventional estimator that ignores action bounds. Experimental results demonstrate that CAPG generally outperforms the conventional estimator, indicating that it is a better policy gradient estimator for continuous control tasks. The source code is available at https://github.com/pfnet-research/capg.} }
Endnote
%0 Conference Paper %T Clipped Action Policy Gradient %A Yasuhiro Fujita %A Shin-ichi Maeda %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-fujita18a %I PMLR %P 1597--1606 %U https://proceedings.mlr.press/v80/fujita18a.html %V 80 %X Many continuous control tasks have bounded action spaces. When policy gradient methods are applied to such tasks, out-of-bound actions need to be clipped before execution, while policies are usually optimized as if the actions are not clipped. We propose a policy gradient estimator that exploits the knowledge of actions being clipped to reduce the variance in estimation. We prove that our estimator, named clipped action policy gradient (CAPG), is unbiased and achieves lower variance than the conventional estimator that ignores action bounds. Experimental results demonstrate that CAPG generally outperforms the conventional estimator, indicating that it is a better policy gradient estimator for continuous control tasks. The source code is available at https://github.com/pfnet-research/capg.
APA
Fujita, Y. & Maeda, S.. (2018). Clipped Action Policy Gradient. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:1597-1606 Available from https://proceedings.mlr.press/v80/fujita18a.html.

Related Material