Fourier Policy Gradients

Matthew Fellows, Kamil Ciosek, Shimon Whiteson
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1486-1495, 2018.

Abstract

We propose a new way of deriving policy gradient updates for reinforcement learning. Our technique, based on Fourier analysis, recasts integrals that arise with expected policy gradients as convolutions and turns them into multiplications. The obtained analytical solutions allow us to capture the low variance benefits of EPG in a broad range of settings. For the critic, we treat trigonometric and radial basis functions, two function families with the universal approximation property. The choice of policy can be almost arbitrary, including mixtures or hybrid continuous-discrete probability distributions. Moreover, we derive a general family of sample-based estimators for stochastic policy gradients, which unifies existing results on sample-based approximation. We believe that this technique has the potential to shape the next generation of policy gradient approaches, powered by analytical results.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-fellows18a, title = {{F}ourier Policy Gradients}, author = {Fellows, Matthew and Ciosek, Kamil and Whiteson, Shimon}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {1486--1495}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/fellows18a/fellows18a.pdf}, url = {https://proceedings.mlr.press/v80/fellows18a.html}, abstract = {We propose a new way of deriving policy gradient updates for reinforcement learning. Our technique, based on Fourier analysis, recasts integrals that arise with expected policy gradients as convolutions and turns them into multiplications. The obtained analytical solutions allow us to capture the low variance benefits of EPG in a broad range of settings. For the critic, we treat trigonometric and radial basis functions, two function families with the universal approximation property. The choice of policy can be almost arbitrary, including mixtures or hybrid continuous-discrete probability distributions. Moreover, we derive a general family of sample-based estimators for stochastic policy gradients, which unifies existing results on sample-based approximation. We believe that this technique has the potential to shape the next generation of policy gradient approaches, powered by analytical results.} }
Endnote
%0 Conference Paper %T Fourier Policy Gradients %A Matthew Fellows %A Kamil Ciosek %A Shimon Whiteson %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-fellows18a %I PMLR %P 1486--1495 %U https://proceedings.mlr.press/v80/fellows18a.html %V 80 %X We propose a new way of deriving policy gradient updates for reinforcement learning. Our technique, based on Fourier analysis, recasts integrals that arise with expected policy gradients as convolutions and turns them into multiplications. The obtained analytical solutions allow us to capture the low variance benefits of EPG in a broad range of settings. For the critic, we treat trigonometric and radial basis functions, two function families with the universal approximation property. The choice of policy can be almost arbitrary, including mixtures or hybrid continuous-discrete probability distributions. Moreover, we derive a general family of sample-based estimators for stochastic policy gradients, which unifies existing results on sample-based approximation. We believe that this technique has the potential to shape the next generation of policy gradient approaches, powered by analytical results.
APA
Fellows, M., Ciosek, K. & Whiteson, S.. (2018). Fourier Policy Gradients. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:1486-1495 Available from https://proceedings.mlr.press/v80/fellows18a.html.

Related Material