Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical Systems

Wesley Suttle, Vipul Kumar Sharma, Krishna Chaitanya Kosaraju, Sivaranjani Seetharaman, Ji Liu, Vijay Gupta, Brian M Sadler
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:4420-4428, 2024.

Abstract

We develop provably safe and convergent reinforcement learning (RL) algorithms for control of nonlinear dynamical systems, bridging the gap between the hard safety guarantees of control theory and the convergence guarantees of RL theory. Recent advances at the intersection of control and RL follow a two-stage, safety filter approach to enforcing hard safety constraints: model-free RL is used to learn a potentially unsafe controller, whose actions are projected onto safe sets prescribed, for example, by a control barrier function. Though safe, such approaches lose any convergence guarantees enjoyed by the underlying RL methods. In this paper, we develop a single-stage, sampling-based approach to hard constraint satisfaction that learns RL controllers enjoying classical convergence guarantees while satisfying hard safety constraints throughout training and deployment. We validate the efficacy of our approach in simulation, including safe control of a quadcopter in a challenging obstacle avoidance problem, and demonstrate that it outperforms existing benchmarks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-suttle24a, title = { Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical Systems }, author = {Suttle, Wesley and Kumar Sharma, Vipul and Chaitanya Kosaraju, Krishna and Seetharaman, Sivaranjani and Liu, Ji and Gupta, Vijay and M Sadler, Brian}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {4420--4428}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/suttle24a/suttle24a.pdf}, url = {https://proceedings.mlr.press/v238/suttle24a.html}, abstract = { We develop provably safe and convergent reinforcement learning (RL) algorithms for control of nonlinear dynamical systems, bridging the gap between the hard safety guarantees of control theory and the convergence guarantees of RL theory. Recent advances at the intersection of control and RL follow a two-stage, safety filter approach to enforcing hard safety constraints: model-free RL is used to learn a potentially unsafe controller, whose actions are projected onto safe sets prescribed, for example, by a control barrier function. Though safe, such approaches lose any convergence guarantees enjoyed by the underlying RL methods. In this paper, we develop a single-stage, sampling-based approach to hard constraint satisfaction that learns RL controllers enjoying classical convergence guarantees while satisfying hard safety constraints throughout training and deployment. We validate the efficacy of our approach in simulation, including safe control of a quadcopter in a challenging obstacle avoidance problem, and demonstrate that it outperforms existing benchmarks. } }
Endnote
%0 Conference Paper %T Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical Systems %A Wesley Suttle %A Vipul Kumar Sharma %A Krishna Chaitanya Kosaraju %A Sivaranjani Seetharaman %A Ji Liu %A Vijay Gupta %A Brian M Sadler %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-suttle24a %I PMLR %P 4420--4428 %U https://proceedings.mlr.press/v238/suttle24a.html %V 238 %X We develop provably safe and convergent reinforcement learning (RL) algorithms for control of nonlinear dynamical systems, bridging the gap between the hard safety guarantees of control theory and the convergence guarantees of RL theory. Recent advances at the intersection of control and RL follow a two-stage, safety filter approach to enforcing hard safety constraints: model-free RL is used to learn a potentially unsafe controller, whose actions are projected onto safe sets prescribed, for example, by a control barrier function. Though safe, such approaches lose any convergence guarantees enjoyed by the underlying RL methods. In this paper, we develop a single-stage, sampling-based approach to hard constraint satisfaction that learns RL controllers enjoying classical convergence guarantees while satisfying hard safety constraints throughout training and deployment. We validate the efficacy of our approach in simulation, including safe control of a quadcopter in a challenging obstacle avoidance problem, and demonstrate that it outperforms existing benchmarks.
APA
Suttle, W., Kumar Sharma, V., Chaitanya Kosaraju, K., Seetharaman, S., Liu, J., Gupta, V. & M Sadler, B.. (2024). Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical Systems . Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:4420-4428 Available from https://proceedings.mlr.press/v238/suttle24a.html.

Related Material