Safe Reinforcement Learning with Chance-constrained Model Predictive Control

Samuel Pfrommer, Tanmay Gautam, Alec Zhou, Somayeh Sojoudi
Proceedings of The 4th Annual Learning for Dynamics and Control Conference, PMLR 168:291-303, 2022.

Abstract

Real-world reinforcement learning (RL) problems often demand that agents behave safely by obeying a set of designed constraints. We address the challenge of safe RL by coupling a safety guide based on model predictive control (MPC) with a modified policy gradient framework in a linear setting with continuous actions. The guide enforces safe operation of the system by embedding safety requirements as chance constraints in the MPC formulation. The policy gradient training step then includes a safety penalty which trains the base policy to behave safely. We show theoretically that this penalty allows for a provably safe optimal base policy and illustrate our method with a simulated linearized quadrotor experiment.

Cite this Paper


BibTeX
@InProceedings{pmlr-v168-pfrommer22a, title = {Safe Reinforcement Learning with Chance-constrained Model Predictive Control}, author = {Pfrommer, Samuel and Gautam, Tanmay and Zhou, Alec and Sojoudi, Somayeh}, booktitle = {Proceedings of The 4th Annual Learning for Dynamics and Control Conference}, pages = {291--303}, year = {2022}, editor = {Firoozi, Roya and Mehr, Negar and Yel, Esen and Antonova, Rika and Bohg, Jeannette and Schwager, Mac and Kochenderfer, Mykel}, volume = {168}, series = {Proceedings of Machine Learning Research}, month = {23--24 Jun}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v168/pfrommer22a/pfrommer22a.pdf}, url = {https://proceedings.mlr.press/v168/pfrommer22a.html}, abstract = {Real-world reinforcement learning (RL) problems often demand that agents behave safely by obeying a set of designed constraints. We address the challenge of safe RL by coupling a safety guide based on model predictive control (MPC) with a modified policy gradient framework in a linear setting with continuous actions. The guide enforces safe operation of the system by embedding safety requirements as chance constraints in the MPC formulation. The policy gradient training step then includes a safety penalty which trains the base policy to behave safely. We show theoretically that this penalty allows for a provably safe optimal base policy and illustrate our method with a simulated linearized quadrotor experiment.} }
Endnote
%0 Conference Paper %T Safe Reinforcement Learning with Chance-constrained Model Predictive Control %A Samuel Pfrommer %A Tanmay Gautam %A Alec Zhou %A Somayeh Sojoudi %B Proceedings of The 4th Annual Learning for Dynamics and Control Conference %C Proceedings of Machine Learning Research %D 2022 %E Roya Firoozi %E Negar Mehr %E Esen Yel %E Rika Antonova %E Jeannette Bohg %E Mac Schwager %E Mykel Kochenderfer %F pmlr-v168-pfrommer22a %I PMLR %P 291--303 %U https://proceedings.mlr.press/v168/pfrommer22a.html %V 168 %X Real-world reinforcement learning (RL) problems often demand that agents behave safely by obeying a set of designed constraints. We address the challenge of safe RL by coupling a safety guide based on model predictive control (MPC) with a modified policy gradient framework in a linear setting with continuous actions. The guide enforces safe operation of the system by embedding safety requirements as chance constraints in the MPC formulation. The policy gradient training step then includes a safety penalty which trains the base policy to behave safely. We show theoretically that this penalty allows for a provably safe optimal base policy and illustrate our method with a simulated linearized quadrotor experiment.
APA
Pfrommer, S., Gautam, T., Zhou, A. & Sojoudi, S.. (2022). Safe Reinforcement Learning with Chance-constrained Model Predictive Control. Proceedings of The 4th Annual Learning for Dynamics and Control Conference, in Proceedings of Machine Learning Research 168:291-303 Available from https://proceedings.mlr.press/v168/pfrommer22a.html.

Related Material