Safe Exploration in Reinforcement Learning: Training Backup Control Barrier Functions with Zero Training-Time Safety Violations

Pedram Rabiee, Amirsaeid Safari
Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, PMLR 283:1326-1337, 2025.

Abstract

This paper introduces the Reinforcement Learning Backup Shield (RLBUS), a framework that guarantees safe exploration in reinforcement learning (RL) by incorporating Backup Control Barrier Functions (BCBFs). RLBUS synthesizes an implicit control forward invariant subset of the safe set using multiple backup policies, ensuring safety in the presence of input constraints. While traditional BCBFs often yield conservative control forward invariant sets due to the design of backup controllers, RLBUS addresses this limitation by leveraging model-free RL to train an additional backup policy, enlarging the identified forward invariant subset of the safe set. This approach enables safe exploration of larger regions of the state space with zero safety violations during training. The effectiveness of RLBUS is demonstrated on an inverted pendulum example, where the expanded invariant set facilitates safe exploration over a broader state space, enhancing performance without compromising safety.

Cite this Paper


BibTeX
@InProceedings{pmlr-v283-rabiee25a, title = {Safe Exploration in Reinforcement Learning: Training Backup Control Barrier Functions with Zero Training-Time Safety Violations}, author = {Rabiee, Pedram and Safari, Amirsaeid}, booktitle = {Proceedings of the 7th Annual Learning for Dynamics \& Control Conference}, pages = {1326--1337}, year = {2025}, editor = {Ozay, Necmiye and Balzano, Laura and Panagou, Dimitra and Abate, Alessandro}, volume = {283}, series = {Proceedings of Machine Learning Research}, month = {04--06 Jun}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v283/main/assets/rabiee25a/rabiee25a.pdf}, url = {https://proceedings.mlr.press/v283/rabiee25a.html}, abstract = {This paper introduces the Reinforcement Learning Backup Shield (RLBUS), a framework that guarantees safe exploration in reinforcement learning (RL) by incorporating Backup Control Barrier Functions (BCBFs). RLBUS synthesizes an implicit control forward invariant subset of the safe set using multiple backup policies, ensuring safety in the presence of input constraints. While traditional BCBFs often yield conservative control forward invariant sets due to the design of backup controllers, RLBUS addresses this limitation by leveraging model-free RL to train an additional backup policy, enlarging the identified forward invariant subset of the safe set. This approach enables safe exploration of larger regions of the state space with zero safety violations during training. The effectiveness of RLBUS is demonstrated on an inverted pendulum example, where the expanded invariant set facilitates safe exploration over a broader state space, enhancing performance without compromising safety.} }
Endnote
%0 Conference Paper %T Safe Exploration in Reinforcement Learning: Training Backup Control Barrier Functions with Zero Training-Time Safety Violations %A Pedram Rabiee %A Amirsaeid Safari %B Proceedings of the 7th Annual Learning for Dynamics \& Control Conference %C Proceedings of Machine Learning Research %D 2025 %E Necmiye Ozay %E Laura Balzano %E Dimitra Panagou %E Alessandro Abate %F pmlr-v283-rabiee25a %I PMLR %P 1326--1337 %U https://proceedings.mlr.press/v283/rabiee25a.html %V 283 %X This paper introduces the Reinforcement Learning Backup Shield (RLBUS), a framework that guarantees safe exploration in reinforcement learning (RL) by incorporating Backup Control Barrier Functions (BCBFs). RLBUS synthesizes an implicit control forward invariant subset of the safe set using multiple backup policies, ensuring safety in the presence of input constraints. While traditional BCBFs often yield conservative control forward invariant sets due to the design of backup controllers, RLBUS addresses this limitation by leveraging model-free RL to train an additional backup policy, enlarging the identified forward invariant subset of the safe set. This approach enables safe exploration of larger regions of the state space with zero safety violations during training. The effectiveness of RLBUS is demonstrated on an inverted pendulum example, where the expanded invariant set facilitates safe exploration over a broader state space, enhancing performance without compromising safety.
APA
Rabiee, P. & Safari, A.. (2025). Safe Exploration in Reinforcement Learning: Training Backup Control Barrier Functions with Zero Training-Time Safety Violations. Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, in Proceedings of Machine Learning Research 283:1326-1337 Available from https://proceedings.mlr.press/v283/rabiee25a.html.

Related Material