Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based Models

Tyler Westenbroek; Jacob Levy; David Fridovich-Keil

Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based Models

Tyler Westenbroek, Jacob Levy, David Fridovich-Keil

Proceedings of The 7th Conference on Robot Learning, PMLR 229:2478-2497, 2023.

Abstract

We focus on developing efficient and reliable policy optimization strategies for robot learning with real-world data. In recent years, policy gradient methods have emerged as a promising paradigm for training control policies in simulation. However, these approaches often remain too data inefficient or unreliable to train on real robotic hardware. In this paper we introduce a novel policy gradient-based policy optimization framework which systematically leverages a (possibly highly simplified) first-principles model and enables learning precise control policies with limited amounts of real-world data. Our approach $1)$ uses the derivatives of the model to produce sample-efficient estimates of the policy gradient and $2)$ uses the model to design a low-level tracking controller, which is embedded in the policy class. Theoretical analysis provides insight into how the presence of this feedback controller addresses overcomes key limitations of stand-alone policy gradient methods, while hardware experiments with a small car and quadruped demonstrate that our approach can learn precise control strategies reliably and with only minutes of real-world data.

Cite this Paper

BibTeX


@InProceedings{pmlr-v229-westenbroek23a,
  title = 	 {Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based Models},
  author =       {Westenbroek, Tyler and Levy, Jacob and Fridovich-Keil, David},
  booktitle = 	 {Proceedings of The 7th Conference on Robot Learning},
  pages = 	 {2478--2497},
  year = 	 {2023},
  editor = 	 {Tan, Jie and Toussaint, Marc and Darvish, Kourosh},
  volume = 	 {229},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v229/westenbroek23a/westenbroek23a.pdf},
  url = 	 {https://proceedings.mlr.press/v229/westenbroek23a.html},
  abstract = 	 {We focus on developing efficient and reliable policy optimization strategies for robot learning with real-world data.  In recent years, policy gradient methods have emerged as a promising paradigm for training control policies in simulation.  However, these approaches often remain too data inefficient or unreliable to train on real robotic hardware. In this paper we introduce a novel policy gradient-based policy optimization framework which systematically leverages a (possibly highly simplified) first-principles model and enables learning precise control policies with limited amounts of real-world data. Our approach $1)$ uses the derivatives of the model to produce sample-efficient estimates of the policy gradient and $2)$ uses the model to design a low-level tracking controller, which is embedded in the policy class. Theoretical analysis provides insight into how the presence of this feedback controller addresses overcomes key limitations of stand-alone policy gradient methods, while hardware experiments with a small car and quadruped demonstrate that our approach can learn precise control strategies reliably and with only minutes of real-world data.}
}

Endnote

%0 Conference Paper
%T Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based Models
%A Tyler Westenbroek
%A Jacob Levy
%A David Fridovich-Keil
%B Proceedings of The 7th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Jie Tan
%E Marc Toussaint
%E Kourosh Darvish	
%F pmlr-v229-westenbroek23a
%I PMLR
%P 2478--2497
%U https://proceedings.mlr.press/v229/westenbroek23a.html
%V 229
%X We focus on developing efficient and reliable policy optimization strategies for robot learning with real-world data.  In recent years, policy gradient methods have emerged as a promising paradigm for training control policies in simulation.  However, these approaches often remain too data inefficient or unreliable to train on real robotic hardware. In this paper we introduce a novel policy gradient-based policy optimization framework which systematically leverages a (possibly highly simplified) first-principles model and enables learning precise control policies with limited amounts of real-world data. Our approach $1)$ uses the derivatives of the model to produce sample-efficient estimates of the policy gradient and $2)$ uses the model to design a low-level tracking controller, which is embedded in the policy class. Theoretical analysis provides insight into how the presence of this feedback controller addresses overcomes key limitations of stand-alone policy gradient methods, while hardware experiments with a small car and quadruped demonstrate that our approach can learn precise control strategies reliably and with only minutes of real-world data.

APA


Westenbroek, T., Levy, J. & Fridovich-Keil, D.. (2023). Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based Models. Proceedings of The 7th Conference on Robot Learning, in Proceedings of Machine Learning Research 229:2478-2497 Available from https://proceedings.mlr.press/v229/westenbroek23a.html.

Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based Models

Abstract

Cite this Paper

Related Material