Responsive Safety in Reinforcement Learning by PID Lagrangian Methods

Adam Stooke, Joshua Achiam, Pieter Abbeel
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:9133-9143, 2020.

Abstract

Lagrangian methods are widely used algorithms for constrained optimization problems, but their learning dynamics exhibit oscillations and overshoot which, when applied to safe reinforcement learning, leads to constraint-violating behavior during agent training. We address this shortcoming by proposing a novel Lagrange multiplier update method that utilizes derivatives of the constraint function. We take a controls perspective, wherein the traditional Lagrange multiplier update behaves as \emph{integral} control; our terms introduce \emph{proportional} and \emph{derivative} control, achieving favorable learning dynamics through damping and predictive measures. We apply our PID Lagrangian methods in deep RL, setting a new state of the art in Safety Gym, a safe RL benchmark. Lastly, we introduce a new method to ease controller tuning by providing invariance to the relative numerical scales of reward and cost. Our extensive experiments demonstrate improved performance and hyperparameter robustness, while our algorithms remain nearly as simple to derive and implement as the traditional Lagrangian approach.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-stooke20a, title = {Responsive Safety in Reinforcement Learning by {PID} Lagrangian Methods}, author = {Stooke, Adam and Achiam, Joshua and Abbeel, Pieter}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {9133--9143}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/stooke20a/stooke20a.pdf}, url = {https://proceedings.mlr.press/v119/stooke20a.html}, abstract = {Lagrangian methods are widely used algorithms for constrained optimization problems, but their learning dynamics exhibit oscillations and overshoot which, when applied to safe reinforcement learning, leads to constraint-violating behavior during agent training. We address this shortcoming by proposing a novel Lagrange multiplier update method that utilizes derivatives of the constraint function. We take a controls perspective, wherein the traditional Lagrange multiplier update behaves as \emph{integral} control; our terms introduce \emph{proportional} and \emph{derivative} control, achieving favorable learning dynamics through damping and predictive measures. We apply our PID Lagrangian methods in deep RL, setting a new state of the art in Safety Gym, a safe RL benchmark. Lastly, we introduce a new method to ease controller tuning by providing invariance to the relative numerical scales of reward and cost. Our extensive experiments demonstrate improved performance and hyperparameter robustness, while our algorithms remain nearly as simple to derive and implement as the traditional Lagrangian approach.} }
Endnote
%0 Conference Paper %T Responsive Safety in Reinforcement Learning by PID Lagrangian Methods %A Adam Stooke %A Joshua Achiam %A Pieter Abbeel %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-stooke20a %I PMLR %P 9133--9143 %U https://proceedings.mlr.press/v119/stooke20a.html %V 119 %X Lagrangian methods are widely used algorithms for constrained optimization problems, but their learning dynamics exhibit oscillations and overshoot which, when applied to safe reinforcement learning, leads to constraint-violating behavior during agent training. We address this shortcoming by proposing a novel Lagrange multiplier update method that utilizes derivatives of the constraint function. We take a controls perspective, wherein the traditional Lagrange multiplier update behaves as \emph{integral} control; our terms introduce \emph{proportional} and \emph{derivative} control, achieving favorable learning dynamics through damping and predictive measures. We apply our PID Lagrangian methods in deep RL, setting a new state of the art in Safety Gym, a safe RL benchmark. Lastly, we introduce a new method to ease controller tuning by providing invariance to the relative numerical scales of reward and cost. Our extensive experiments demonstrate improved performance and hyperparameter robustness, while our algorithms remain nearly as simple to derive and implement as the traditional Lagrangian approach.
APA
Stooke, A., Achiam, J. & Abbeel, P.. (2020). Responsive Safety in Reinforcement Learning by PID Lagrangian Methods. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:9133-9143 Available from https://proceedings.mlr.press/v119/stooke20a.html.

Related Material