Safe Reinforcement Learning of Control-Affine Systems with Vertex Networks

Liyuan Zheng; Yuanyuan Shi; Lillian J. Ratliff; Baosen Zhang

Safe Reinforcement Learning of Control-Affine Systems with Vertex Networks

Liyuan Zheng, Yuanyuan Shi, Lillian J. Ratliff, Baosen Zhang

Proceedings of the 3rd Conference on Learning for Dynamics and Control, PMLR 144:336-347, 2021.

Abstract

This paper focuses on finding reinforcement learning policies for control systems with hard state and action constraints. Despite its success in many domains, reinforcement learning is challenging to apply to problems with hard constraints, especially if both the state variables and actions are constrained. Previous works seeking to ensure constraint satisfaction, or safety, have focused on adding a projection step to the policy during learning. Yet, this approach requires solving an optimization problem at every policy execution step, which can lead to significant computational costs and has no safety guarantee with the projection step removed after training. To tackle this problem, this paper proposes a new approach, termed Vertex Networks (VNs), with guarantees on safety during both the exploration and execution stage, by incorporating the safety constraints into the policy network architecture. Leveraging the geometric property that all points within a convex set can be represented as the convex combination of its vertices, the proposed algorithm first learns the convex combination weights and then uses these weights along with the pre-calculated vertices to output an action. The output action is guaranteed to be safe by construction. Numerical examples illustrate that the proposed VN algorithm outperforms projection-based reinforcement learning methods.

Cite this Paper

BibTeX


@InProceedings{pmlr-v144-zheng21a,
  title = 	 {Safe Reinforcement Learning of Control-Affine Systems with Vertex Networks},
  author =       {Zheng, Liyuan and Shi, Yuanyuan and Ratliff, Lillian J. and Zhang, Baosen},
  booktitle = 	 {Proceedings of the 3rd Conference on Learning for Dynamics and Control},
  pages = 	 {336--347},
  year = 	 {2021},
  editor = 	 {Jadbabaie, Ali and Lygeros, John and Pappas, George J. and A. Parrilo, Pablo and Recht, Benjamin and Tomlin, Claire J. and Zeilinger, Melanie N.},
  volume = 	 {144},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {07 -- 08 June},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v144/zheng21a/zheng21a.pdf},
  url = 	 {https://proceedings.mlr.press/v144/zheng21a.html},
  abstract = 	 {This paper focuses on finding reinforcement learning policies for control systems with hard state and action constraints. Despite its success in many domains, reinforcement learning is challenging to apply to problems with hard constraints, especially if both the state variables and actions are constrained. Previous works seeking to ensure constraint satisfaction, or safety, have focused on adding a projection step to the policy during learning. Yet, this approach requires solving an optimization problem at every policy execution step, which can lead to significant computational costs and has no safety guarantee with the projection step removed after training. To tackle this problem, this paper proposes a new approach, termed Vertex Networks (VNs), with guarantees on safety during both the exploration and execution stage, by incorporating the safety constraints into the policy network architecture. Leveraging the geometric property that all points within a convex set can be represented as the convex combination of its vertices, the proposed algorithm first learns the convex combination weights and then uses these weights along with the pre-calculated vertices to output an action. The output action is guaranteed to be safe by construction. Numerical examples illustrate that the proposed VN algorithm outperforms projection-based reinforcement learning methods.}
}

Endnote

%0 Conference Paper
%T Safe Reinforcement Learning of Control-Affine Systems with Vertex Networks
%A Liyuan Zheng
%A Yuanyuan Shi
%A Lillian J. Ratliff
%A Baosen Zhang
%B Proceedings of the 3rd Conference on Learning for Dynamics and Control
%C Proceedings of Machine Learning Research
%D 2021
%E Ali Jadbabaie
%E John Lygeros
%E George J. Pappas
%E Pablo A. Parrilo
%E Benjamin Recht
%E Claire J. Tomlin
%E Melanie N. Zeilinger	
%F pmlr-v144-zheng21a
%I PMLR
%P 336--347
%U https://proceedings.mlr.press/v144/zheng21a.html
%V 144
%X This paper focuses on finding reinforcement learning policies for control systems with hard state and action constraints. Despite its success in many domains, reinforcement learning is challenging to apply to problems with hard constraints, especially if both the state variables and actions are constrained. Previous works seeking to ensure constraint satisfaction, or safety, have focused on adding a projection step to the policy during learning. Yet, this approach requires solving an optimization problem at every policy execution step, which can lead to significant computational costs and has no safety guarantee with the projection step removed after training. To tackle this problem, this paper proposes a new approach, termed Vertex Networks (VNs), with guarantees on safety during both the exploration and execution stage, by incorporating the safety constraints into the policy network architecture. Leveraging the geometric property that all points within a convex set can be represented as the convex combination of its vertices, the proposed algorithm first learns the convex combination weights and then uses these weights along with the pre-calculated vertices to output an action. The output action is guaranteed to be safe by construction. Numerical examples illustrate that the proposed VN algorithm outperforms projection-based reinforcement learning methods.

APA


Zheng, L., Shi, Y., Ratliff, L.J. & Zhang, B.. (2021). Safe Reinforcement Learning of Control-Affine Systems with Vertex Networks. Proceedings of the 3rd Conference on Learning for Dynamics and Control, in Proceedings of Machine Learning Research 144:336-347 Available from https://proceedings.mlr.press/v144/zheng21a.html.

Related Material

Download PDF