Projections for Approximate Policy Iteration Algorithms

Riad Akrour, Joni Pajarinen, Jan Peters, Gerhard Neumann
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:181-190, 2019.

Abstract

Approximate policy iteration is a class of reinforcement learning (RL) algorithms where the policy is encoded using a function approximator and which has been especially prominent in RL with continuous action spaces. In this class of RL algorithms, ensuring increase of the policy return during policy update often requires to constrain the change in action distribution. Several approximations exist in the literature to solve this constrained policy update problem. In this paper, we propose to improve over such solutions by introducing a set of projections that transform the constrained problem into an unconstrained one which is then solved by standard gradient descent. Using these projections, we empirically demonstrate that our approach can improve the policy update solution and the control over exploration of existing approximate policy iteration algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-akrour19a, title = {Projections for Approximate Policy Iteration Algorithms}, author = {Akrour, Riad and Pajarinen, Joni and Peters, Jan and Neumann, Gerhard}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {181--190}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/akrour19a/akrour19a.pdf}, url = {https://proceedings.mlr.press/v97/akrour19a.html}, abstract = {Approximate policy iteration is a class of reinforcement learning (RL) algorithms where the policy is encoded using a function approximator and which has been especially prominent in RL with continuous action spaces. In this class of RL algorithms, ensuring increase of the policy return during policy update often requires to constrain the change in action distribution. Several approximations exist in the literature to solve this constrained policy update problem. In this paper, we propose to improve over such solutions by introducing a set of projections that transform the constrained problem into an unconstrained one which is then solved by standard gradient descent. Using these projections, we empirically demonstrate that our approach can improve the policy update solution and the control over exploration of existing approximate policy iteration algorithms.} }
Endnote
%0 Conference Paper %T Projections for Approximate Policy Iteration Algorithms %A Riad Akrour %A Joni Pajarinen %A Jan Peters %A Gerhard Neumann %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-akrour19a %I PMLR %P 181--190 %U https://proceedings.mlr.press/v97/akrour19a.html %V 97 %X Approximate policy iteration is a class of reinforcement learning (RL) algorithms where the policy is encoded using a function approximator and which has been especially prominent in RL with continuous action spaces. In this class of RL algorithms, ensuring increase of the policy return during policy update often requires to constrain the change in action distribution. Several approximations exist in the literature to solve this constrained policy update problem. In this paper, we propose to improve over such solutions by introducing a set of projections that transform the constrained problem into an unconstrained one which is then solved by standard gradient descent. Using these projections, we empirically demonstrate that our approach can improve the policy update solution and the control over exploration of existing approximate policy iteration algorithms.
APA
Akrour, R., Pajarinen, J., Peters, J. & Neumann, G.. (2019). Projections for Approximate Policy Iteration Algorithms. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:181-190 Available from https://proceedings.mlr.press/v97/akrour19a.html.

Related Material