A Near-Optimal Algorithm for Safe Reinforcement Learning Under Instantaneous Hard Constraints

Ming Shi, Yingbin Liang, Ness Shroff
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:31243-31268, 2023.

Abstract

In many applications of Reinforcement Learning (RL), it is critically important that the algorithm performs safely, such that instantaneous hard constraints are satisfied at each step, and unsafe states and actions are avoided. However, existing algorithms for “safe” RL are often designed under constraints that either require expected cumulative costs to be bounded or assume all states are safe. Thus, such algorithms could violate instantaneous hard constraints and traverse unsafe states (and actions) in practice. Hence, in this paper, we develop the first near-optimal safe RL algorithm for episodic Markov Decision Processes with unsafe states and actions under instantaneous hard constraints and the linear mixture model. It achieves a regret $\tilde{O}(\frac{d H^3 \sqrt{d K}}{\Delta_c})$ that nearly matches the state-of-the-art regret in the setting with only unsafe actions and that in the unconstrained setting, and is safe at each step, where $d$ is the feature-mapping dimension, $K$ is the number of episodes, $H$ is the episode length, and $\Delta_c$ is a safety-related parameter. We also provide a lower bound $\tilde{\Omega}(\max\{d H \sqrt{K}, \frac{H}{\Delta_c^2}\})$, which indicates that the dependency on $\Delta_c$ is necessary. Further, both our algorithm design and regret analysis involve several novel ideas, which may be of independent interest.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-shi23c, title = {A Near-Optimal Algorithm for Safe Reinforcement Learning Under Instantaneous Hard Constraints}, author = {Shi, Ming and Liang, Yingbin and Shroff, Ness}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {31243--31268}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/shi23c/shi23c.pdf}, url = {https://proceedings.mlr.press/v202/shi23c.html}, abstract = {In many applications of Reinforcement Learning (RL), it is critically important that the algorithm performs safely, such that instantaneous hard constraints are satisfied at each step, and unsafe states and actions are avoided. However, existing algorithms for “safe” RL are often designed under constraints that either require expected cumulative costs to be bounded or assume all states are safe. Thus, such algorithms could violate instantaneous hard constraints and traverse unsafe states (and actions) in practice. Hence, in this paper, we develop the first near-optimal safe RL algorithm for episodic Markov Decision Processes with unsafe states and actions under instantaneous hard constraints and the linear mixture model. It achieves a regret $\tilde{O}(\frac{d H^3 \sqrt{d K}}{\Delta_c})$ that nearly matches the state-of-the-art regret in the setting with only unsafe actions and that in the unconstrained setting, and is safe at each step, where $d$ is the feature-mapping dimension, $K$ is the number of episodes, $H$ is the episode length, and $\Delta_c$ is a safety-related parameter. We also provide a lower bound $\tilde{\Omega}(\max\{d H \sqrt{K}, \frac{H}{\Delta_c^2}\})$, which indicates that the dependency on $\Delta_c$ is necessary. Further, both our algorithm design and regret analysis involve several novel ideas, which may be of independent interest.} }
Endnote
%0 Conference Paper %T A Near-Optimal Algorithm for Safe Reinforcement Learning Under Instantaneous Hard Constraints %A Ming Shi %A Yingbin Liang %A Ness Shroff %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-shi23c %I PMLR %P 31243--31268 %U https://proceedings.mlr.press/v202/shi23c.html %V 202 %X In many applications of Reinforcement Learning (RL), it is critically important that the algorithm performs safely, such that instantaneous hard constraints are satisfied at each step, and unsafe states and actions are avoided. However, existing algorithms for “safe” RL are often designed under constraints that either require expected cumulative costs to be bounded or assume all states are safe. Thus, such algorithms could violate instantaneous hard constraints and traverse unsafe states (and actions) in practice. Hence, in this paper, we develop the first near-optimal safe RL algorithm for episodic Markov Decision Processes with unsafe states and actions under instantaneous hard constraints and the linear mixture model. It achieves a regret $\tilde{O}(\frac{d H^3 \sqrt{d K}}{\Delta_c})$ that nearly matches the state-of-the-art regret in the setting with only unsafe actions and that in the unconstrained setting, and is safe at each step, where $d$ is the feature-mapping dimension, $K$ is the number of episodes, $H$ is the episode length, and $\Delta_c$ is a safety-related parameter. We also provide a lower bound $\tilde{\Omega}(\max\{d H \sqrt{K}, \frac{H}{\Delta_c^2}\})$, which indicates that the dependency on $\Delta_c$ is necessary. Further, both our algorithm design and regret analysis involve several novel ideas, which may be of independent interest.
APA
Shi, M., Liang, Y. & Shroff, N.. (2023). A Near-Optimal Algorithm for Safe Reinforcement Learning Under Instantaneous Hard Constraints. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:31243-31268 Available from https://proceedings.mlr.press/v202/shi23c.html.

Related Material