Model-Free Trajectory Optimization for Reinforcement Learning

Riad Akrour, Gerhard Neumann, Hany Abdulsamad, Abbas Abdolmaleki
Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:2961-2970, 2016.

Abstract

Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update. However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy. In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-Function, allowing the derivation of the policy update in closed form. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics demonstrating improved performance in comparison to related Trajectory Optimization algorithms linearizing the dynamics.

Cite this Paper


BibTeX
@InProceedings{pmlr-v48-akrour16, title = {Model-Free Trajectory Optimization for Reinforcement Learning}, author = {Akrour, Riad and Neumann, Gerhard and Abdulsamad, Hany and Abdolmaleki, Abbas}, booktitle = {Proceedings of The 33rd International Conference on Machine Learning}, pages = {2961--2970}, year = {2016}, editor = {Balcan, Maria Florina and Weinberger, Kilian Q.}, volume = {48}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {20--22 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v48/akrour16.pdf}, url = {https://proceedings.mlr.press/v48/akrour16.html}, abstract = {Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update. However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy. In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-Function, allowing the derivation of the policy update in closed form. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics demonstrating improved performance in comparison to related Trajectory Optimization algorithms linearizing the dynamics.} }
Endnote
%0 Conference Paper %T Model-Free Trajectory Optimization for Reinforcement Learning %A Riad Akrour %A Gerhard Neumann %A Hany Abdulsamad %A Abbas Abdolmaleki %B Proceedings of The 33rd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2016 %E Maria Florina Balcan %E Kilian Q. Weinberger %F pmlr-v48-akrour16 %I PMLR %P 2961--2970 %U https://proceedings.mlr.press/v48/akrour16.html %V 48 %X Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update. However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy. In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-Function, allowing the derivation of the policy update in closed form. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics demonstrating improved performance in comparison to related Trajectory Optimization algorithms linearizing the dynamics.
RIS
TY - CPAPER TI - Model-Free Trajectory Optimization for Reinforcement Learning AU - Riad Akrour AU - Gerhard Neumann AU - Hany Abdulsamad AU - Abbas Abdolmaleki BT - Proceedings of The 33rd International Conference on Machine Learning DA - 2016/06/11 ED - Maria Florina Balcan ED - Kilian Q. Weinberger ID - pmlr-v48-akrour16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 48 SP - 2961 EP - 2970 L1 - http://proceedings.mlr.press/v48/akrour16.pdf UR - https://proceedings.mlr.press/v48/akrour16.html AB - Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update. However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy. In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-Function, allowing the derivation of the policy update in closed form. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics demonstrating improved performance in comparison to related Trajectory Optimization algorithms linearizing the dynamics. ER -
APA
Akrour, R., Neumann, G., Abdulsamad, H. & Abdolmaleki, A.. (2016). Model-Free Trajectory Optimization for Reinforcement Learning. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:2961-2970 Available from https://proceedings.mlr.press/v48/akrour16.html.

Related Material