Tractable Reinforcement Learning of Signal Temporal Logic Objectives

Harish Venkataraman, Derya Aksaray, Peter Seiler
Proceedings of the 2nd Conference on Learning for Dynamics and Control, PMLR 120:308-317, 2020.

Abstract

Signal temporal logic (STL) is an expressive language to specify time-bound real-world robotic tasks and safety specifications. Recently, there has been an interest in learning optimal policies to satisfy STL specifications via reinforcement learning (RL). Learning to satisfy STL specifications often needs a sufficient length of state history to compute reward and the next action. The need for history results in exponential state-space growth for the learning problem. Thus the learning problem becomes computationally intractable for most real-world applications. In this paper, we propose a compact means to capture state history in a new augmented state-space representation. An approximation to the objective (maximizing probability of satisfaction) is proposed and solved for in the new augmented state-space. We show the performance bound of the approximate solution and compare it with the solution of an existing technique via simulations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v120-venkataraman20a, title = {Tractable Reinforcement Learning of Signal Temporal Logic Objectives}, author = {Venkataraman, Harish and Aksaray, Derya and Seiler, Peter}, booktitle = {Proceedings of the 2nd Conference on Learning for Dynamics and Control}, pages = {308--317}, year = {2020}, editor = {Bayen, Alexandre M. and Jadbabaie, Ali and Pappas, George and Parrilo, Pablo A. and Recht, Benjamin and Tomlin, Claire and Zeilinger, Melanie}, volume = {120}, series = {Proceedings of Machine Learning Research}, month = {10--11 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v120/venkataraman20a/venkataraman20a.pdf}, url = {https://proceedings.mlr.press/v120/venkataraman20a.html}, abstract = {Signal temporal logic (STL) is an expressive language to specify time-bound real-world robotic tasks and safety specifications. Recently, there has been an interest in learning optimal policies to satisfy STL specifications via reinforcement learning (RL). Learning to satisfy STL specifications often needs a sufficient length of state history to compute reward and the next action. The need for history results in exponential state-space growth for the learning problem. Thus the learning problem becomes computationally intractable for most real-world applications. In this paper, we propose a compact means to capture state history in a new augmented state-space representation. An approximation to the objective (maximizing probability of satisfaction) is proposed and solved for in the new augmented state-space. We show the performance bound of the approximate solution and compare it with the solution of an existing technique via simulations.} }
Endnote
%0 Conference Paper %T Tractable Reinforcement Learning of Signal Temporal Logic Objectives %A Harish Venkataraman %A Derya Aksaray %A Peter Seiler %B Proceedings of the 2nd Conference on Learning for Dynamics and Control %C Proceedings of Machine Learning Research %D 2020 %E Alexandre M. Bayen %E Ali Jadbabaie %E George Pappas %E Pablo A. Parrilo %E Benjamin Recht %E Claire Tomlin %E Melanie Zeilinger %F pmlr-v120-venkataraman20a %I PMLR %P 308--317 %U https://proceedings.mlr.press/v120/venkataraman20a.html %V 120 %X Signal temporal logic (STL) is an expressive language to specify time-bound real-world robotic tasks and safety specifications. Recently, there has been an interest in learning optimal policies to satisfy STL specifications via reinforcement learning (RL). Learning to satisfy STL specifications often needs a sufficient length of state history to compute reward and the next action. The need for history results in exponential state-space growth for the learning problem. Thus the learning problem becomes computationally intractable for most real-world applications. In this paper, we propose a compact means to capture state history in a new augmented state-space representation. An approximation to the objective (maximizing probability of satisfaction) is proposed and solved for in the new augmented state-space. We show the performance bound of the approximate solution and compare it with the solution of an existing technique via simulations.
APA
Venkataraman, H., Aksaray, D. & Seiler, P.. (2020). Tractable Reinforcement Learning of Signal Temporal Logic Objectives. Proceedings of the 2nd Conference on Learning for Dynamics and Control, in Proceedings of Machine Learning Research 120:308-317 Available from https://proceedings.mlr.press/v120/venkataraman20a.html.

Related Material