Handling Long-Term Safety and Uncertainty in Safe Reinforcement Learning

Jonas Günster, Puze Liu, Jan Peters, Davide Tateo
Proceedings of The 8th Conference on Robot Learning, PMLR 270:4670-4697, 2025.

Abstract

Safety is one of the key issues preventing the deployment of reinforcement learning techniques in real-world robots. While most approaches in the Safe Reinforcement Learning area do not require prior knowledge of constraints and robot kinematics and rely solely on data, it is often difficult to deploy them in complex real-world settings. Instead, model-based approaches that incorporate prior knowledge of the constraints and dynamics into the learning framework have proven capable of deploying the learning algorithm directly on the real robot. Unfortunately, while an approximated model of the robot dynamics is often available, the safety constraints are task-specific and hard to obtain: they may be too complicated to encode analytically, too expensive to compute, or it may be difficult to envision a priori the long-term safety requirements. In this paper, we bridge this gap by extending the safe exploration method, ATACOM, with learnable constraints, with a particular focus on ensuring long-term safety and handling of uncertainty. Our approach is competitive or superior to state-of-the-art methods in final performance while maintaining safer behavior during training.

Cite this Paper


BibTeX
@InProceedings{pmlr-v270-gunster25a, title = {Handling Long-Term Safety and Uncertainty in Safe Reinforcement Learning}, author = {G{\"{u}}nster, Jonas and Liu, Puze and Peters, Jan and Tateo, Davide}, booktitle = {Proceedings of The 8th Conference on Robot Learning}, pages = {4670--4697}, year = {2025}, editor = {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram}, volume = {270}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v270/main/assets/gunster25a/gunster25a.pdf}, url = {https://proceedings.mlr.press/v270/gunster25a.html}, abstract = {Safety is one of the key issues preventing the deployment of reinforcement learning techniques in real-world robots. While most approaches in the Safe Reinforcement Learning area do not require prior knowledge of constraints and robot kinematics and rely solely on data, it is often difficult to deploy them in complex real-world settings. Instead, model-based approaches that incorporate prior knowledge of the constraints and dynamics into the learning framework have proven capable of deploying the learning algorithm directly on the real robot. Unfortunately, while an approximated model of the robot dynamics is often available, the safety constraints are task-specific and hard to obtain: they may be too complicated to encode analytically, too expensive to compute, or it may be difficult to envision a priori the long-term safety requirements. In this paper, we bridge this gap by extending the safe exploration method, ATACOM, with learnable constraints, with a particular focus on ensuring long-term safety and handling of uncertainty. Our approach is competitive or superior to state-of-the-art methods in final performance while maintaining safer behavior during training.} }
Endnote
%0 Conference Paper %T Handling Long-Term Safety and Uncertainty in Safe Reinforcement Learning %A Jonas Günster %A Puze Liu %A Jan Peters %A Davide Tateo %B Proceedings of The 8th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Pulkit Agrawal %E Oliver Kroemer %E Wolfram Burgard %F pmlr-v270-gunster25a %I PMLR %P 4670--4697 %U https://proceedings.mlr.press/v270/gunster25a.html %V 270 %X Safety is one of the key issues preventing the deployment of reinforcement learning techniques in real-world robots. While most approaches in the Safe Reinforcement Learning area do not require prior knowledge of constraints and robot kinematics and rely solely on data, it is often difficult to deploy them in complex real-world settings. Instead, model-based approaches that incorporate prior knowledge of the constraints and dynamics into the learning framework have proven capable of deploying the learning algorithm directly on the real robot. Unfortunately, while an approximated model of the robot dynamics is often available, the safety constraints are task-specific and hard to obtain: they may be too complicated to encode analytically, too expensive to compute, or it may be difficult to envision a priori the long-term safety requirements. In this paper, we bridge this gap by extending the safe exploration method, ATACOM, with learnable constraints, with a particular focus on ensuring long-term safety and handling of uncertainty. Our approach is competitive or superior to state-of-the-art methods in final performance while maintaining safer behavior during training.
APA
Günster, J., Liu, P., Peters, J. & Tateo, D.. (2025). Handling Long-Term Safety and Uncertainty in Safe Reinforcement Learning. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:4670-4697 Available from https://proceedings.mlr.press/v270/gunster25a.html.

Related Material