Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based Models

Tyler Westenbroek, Jacob Levy, David Fridovich-Keil
Proceedings of The 7th Conference on Robot Learning, PMLR 229:2478-2497, 2023.

Abstract

We focus on developing efficient and reliable policy optimization strategies for robot learning with real-world data. In recent years, policy gradient methods have emerged as a promising paradigm for training control policies in simulation. However, these approaches often remain too data inefficient or unreliable to train on real robotic hardware. In this paper we introduce a novel policy gradient-based policy optimization framework which systematically leverages a (possibly highly simplified) first-principles model and enables learning precise control policies with limited amounts of real-world data. Our approach $1)$ uses the derivatives of the model to produce sample-efficient estimates of the policy gradient and $2)$ uses the model to design a low-level tracking controller, which is embedded in the policy class. Theoretical analysis provides insight into how the presence of this feedback controller addresses overcomes key limitations of stand-alone policy gradient methods, while hardware experiments with a small car and quadruped demonstrate that our approach can learn precise control strategies reliably and with only minutes of real-world data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v229-westenbroek23a, title = {Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based Models}, author = {Westenbroek, Tyler and Levy, Jacob and Fridovich-Keil, David}, booktitle = {Proceedings of The 7th Conference on Robot Learning}, pages = {2478--2497}, year = {2023}, editor = {Tan, Jie and Toussaint, Marc and Darvish, Kourosh}, volume = {229}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v229/westenbroek23a/westenbroek23a.pdf}, url = {https://proceedings.mlr.press/v229/westenbroek23a.html}, abstract = {We focus on developing efficient and reliable policy optimization strategies for robot learning with real-world data. In recent years, policy gradient methods have emerged as a promising paradigm for training control policies in simulation. However, these approaches often remain too data inefficient or unreliable to train on real robotic hardware. In this paper we introduce a novel policy gradient-based policy optimization framework which systematically leverages a (possibly highly simplified) first-principles model and enables learning precise control policies with limited amounts of real-world data. Our approach $1)$ uses the derivatives of the model to produce sample-efficient estimates of the policy gradient and $2)$ uses the model to design a low-level tracking controller, which is embedded in the policy class. Theoretical analysis provides insight into how the presence of this feedback controller addresses overcomes key limitations of stand-alone policy gradient methods, while hardware experiments with a small car and quadruped demonstrate that our approach can learn precise control strategies reliably and with only minutes of real-world data.} }
Endnote
%0 Conference Paper %T Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based Models %A Tyler Westenbroek %A Jacob Levy %A David Fridovich-Keil %B Proceedings of The 7th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2023 %E Jie Tan %E Marc Toussaint %E Kourosh Darvish %F pmlr-v229-westenbroek23a %I PMLR %P 2478--2497 %U https://proceedings.mlr.press/v229/westenbroek23a.html %V 229 %X We focus on developing efficient and reliable policy optimization strategies for robot learning with real-world data. In recent years, policy gradient methods have emerged as a promising paradigm for training control policies in simulation. However, these approaches often remain too data inefficient or unreliable to train on real robotic hardware. In this paper we introduce a novel policy gradient-based policy optimization framework which systematically leverages a (possibly highly simplified) first-principles model and enables learning precise control policies with limited amounts of real-world data. Our approach $1)$ uses the derivatives of the model to produce sample-efficient estimates of the policy gradient and $2)$ uses the model to design a low-level tracking controller, which is embedded in the policy class. Theoretical analysis provides insight into how the presence of this feedback controller addresses overcomes key limitations of stand-alone policy gradient methods, while hardware experiments with a small car and quadruped demonstrate that our approach can learn precise control strategies reliably and with only minutes of real-world data.
APA
Westenbroek, T., Levy, J. & Fridovich-Keil, D.. (2023). Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based Models. Proceedings of The 7th Conference on Robot Learning, in Proceedings of Machine Learning Research 229:2478-2497 Available from https://proceedings.mlr.press/v229/westenbroek23a.html.

Related Material