Embedding Safety into RL: A New Take on Trust Region Methods

Nikola Milosevic, Johannes Müller, Nico Scherf
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:44199-44224, 2025.

Abstract

Reinforcement Learning (RL) agents can solve diverse tasks but often exhibit unsafe behavior. Constrained Markov Decision Processes (CMDPs) address this by enforcing safety constraints, yet existing methods either sacrifice reward maximization or allow unsafe training. We introduce Constrained Trust Region Policy Optimization (C-TRPO), which reshapes the policy space geometry to ensure trust regions contain only safe policies, guaranteeing constraint satisfaction throughout training. We analyze its theoretical properties and connections to TRPO, Natural Policy Gradient (NPG), and Constrained Policy Optimization (CPO). Experiments show that C-TRPO reduces constraint violations while maintaining competitive returns.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-milosevic25a, title = {Embedding Safety into {RL}: A New Take on Trust Region Methods}, author = {Milosevic, Nikola and M\"{u}ller, Johannes and Scherf, Nico}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {44199--44224}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/milosevic25a/milosevic25a.pdf}, url = {https://proceedings.mlr.press/v267/milosevic25a.html}, abstract = {Reinforcement Learning (RL) agents can solve diverse tasks but often exhibit unsafe behavior. Constrained Markov Decision Processes (CMDPs) address this by enforcing safety constraints, yet existing methods either sacrifice reward maximization or allow unsafe training. We introduce Constrained Trust Region Policy Optimization (C-TRPO), which reshapes the policy space geometry to ensure trust regions contain only safe policies, guaranteeing constraint satisfaction throughout training. We analyze its theoretical properties and connections to TRPO, Natural Policy Gradient (NPG), and Constrained Policy Optimization (CPO). Experiments show that C-TRPO reduces constraint violations while maintaining competitive returns.} }
Endnote
%0 Conference Paper %T Embedding Safety into RL: A New Take on Trust Region Methods %A Nikola Milosevic %A Johannes Müller %A Nico Scherf %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-milosevic25a %I PMLR %P 44199--44224 %U https://proceedings.mlr.press/v267/milosevic25a.html %V 267 %X Reinforcement Learning (RL) agents can solve diverse tasks but often exhibit unsafe behavior. Constrained Markov Decision Processes (CMDPs) address this by enforcing safety constraints, yet existing methods either sacrifice reward maximization or allow unsafe training. We introduce Constrained Trust Region Policy Optimization (C-TRPO), which reshapes the policy space geometry to ensure trust regions contain only safe policies, guaranteeing constraint satisfaction throughout training. We analyze its theoretical properties and connections to TRPO, Natural Policy Gradient (NPG), and Constrained Policy Optimization (CPO). Experiments show that C-TRPO reduces constraint violations while maintaining competitive returns.
APA
Milosevic, N., Müller, J. & Scherf, N.. (2025). Embedding Safety into RL: A New Take on Trust Region Methods. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:44199-44224 Available from https://proceedings.mlr.press/v267/milosevic25a.html.

Related Material