Deep Off-Policy Iterative Learning Control

Swaminathan Gurumurthy, J Zico Kolter, Zachary Manchester
Proceedings of The 5th Annual Learning for Dynamics and Control Conference, PMLR 211:639-652, 2023.

Abstract

Reinforcement learning has emerged as a powerful paradigm to learn control policies while making few assumptions about the environment. However, this lack of assumptions in popular RL algorithms also leads to sample inefficiency. Furthermore, we often have access to a simulator that can provide approximate gradients for the rewards and dynamics of the environment. Iterative learning control (ILC) approaches have been shown to be very efficient at learning policies by using approximate simulator gradients to speed up optimization. However, they lack the generality of reinforcement learning approaches. In this paper, we take inspiration from ILC and propose an update equation for the value-function gradients (computed using the dynamics Jacobians and reward gradient obtained from an approximate simulator) to speed up value-function and policy optimization. We add this update to an off-the-shelf off-policy reinforcement learning algorithm and demonstrate that using the value-gradient update leads to a significant improvement in sample efficiency (and sometimes better performance) both when learning from scratch in a new environment and while fine-tuning a pre-trained policy in a new environment. Moreover, we observe that policies pretrained in the simulator using the simulator jacobians obtain better zero-shot transfer performance and adapt much faster in a new environment.

Cite this Paper


BibTeX
@InProceedings{pmlr-v211-gurumurthy23b, title = {Deep Off-Policy Iterative Learning Control}, author = {Gurumurthy, Swaminathan and Kolter, J Zico and Manchester, Zachary}, booktitle = {Proceedings of The 5th Annual Learning for Dynamics and Control Conference}, pages = {639--652}, year = {2023}, editor = {Matni, Nikolai and Morari, Manfred and Pappas, George J.}, volume = {211}, series = {Proceedings of Machine Learning Research}, month = {15--16 Jun}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v211/gurumurthy23b/gurumurthy23b.pdf}, url = {https://proceedings.mlr.press/v211/gurumurthy23b.html}, abstract = {Reinforcement learning has emerged as a powerful paradigm to learn control policies while making few assumptions about the environment. However, this lack of assumptions in popular RL algorithms also leads to sample inefficiency. Furthermore, we often have access to a simulator that can provide approximate gradients for the rewards and dynamics of the environment. Iterative learning control (ILC) approaches have been shown to be very efficient at learning policies by using approximate simulator gradients to speed up optimization. However, they lack the generality of reinforcement learning approaches. In this paper, we take inspiration from ILC and propose an update equation for the value-function gradients (computed using the dynamics Jacobians and reward gradient obtained from an approximate simulator) to speed up value-function and policy optimization. We add this update to an off-the-shelf off-policy reinforcement learning algorithm and demonstrate that using the value-gradient update leads to a significant improvement in sample efficiency (and sometimes better performance) both when learning from scratch in a new environment and while fine-tuning a pre-trained policy in a new environment. Moreover, we observe that policies pretrained in the simulator using the simulator jacobians obtain better zero-shot transfer performance and adapt much faster in a new environment.} }
Endnote
%0 Conference Paper %T Deep Off-Policy Iterative Learning Control %A Swaminathan Gurumurthy %A J Zico Kolter %A Zachary Manchester %B Proceedings of The 5th Annual Learning for Dynamics and Control Conference %C Proceedings of Machine Learning Research %D 2023 %E Nikolai Matni %E Manfred Morari %E George J. Pappas %F pmlr-v211-gurumurthy23b %I PMLR %P 639--652 %U https://proceedings.mlr.press/v211/gurumurthy23b.html %V 211 %X Reinforcement learning has emerged as a powerful paradigm to learn control policies while making few assumptions about the environment. However, this lack of assumptions in popular RL algorithms also leads to sample inefficiency. Furthermore, we often have access to a simulator that can provide approximate gradients for the rewards and dynamics of the environment. Iterative learning control (ILC) approaches have been shown to be very efficient at learning policies by using approximate simulator gradients to speed up optimization. However, they lack the generality of reinforcement learning approaches. In this paper, we take inspiration from ILC and propose an update equation for the value-function gradients (computed using the dynamics Jacobians and reward gradient obtained from an approximate simulator) to speed up value-function and policy optimization. We add this update to an off-the-shelf off-policy reinforcement learning algorithm and demonstrate that using the value-gradient update leads to a significant improvement in sample efficiency (and sometimes better performance) both when learning from scratch in a new environment and while fine-tuning a pre-trained policy in a new environment. Moreover, we observe that policies pretrained in the simulator using the simulator jacobians obtain better zero-shot transfer performance and adapt much faster in a new environment.
APA
Gurumurthy, S., Kolter, J.Z. & Manchester, Z.. (2023). Deep Off-Policy Iterative Learning Control. Proceedings of The 5th Annual Learning for Dynamics and Control Conference, in Proceedings of Machine Learning Research 211:639-652 Available from https://proceedings.mlr.press/v211/gurumurthy23b.html.

Related Material