Distributed Control using Reinforcement Learning with Temporal-Logic-Based Reward Shaping

Ningyuan Zhang, Wenliang Liu, Calin Belta
Proceedings of The 4th Annual Learning for Dynamics and Control Conference, PMLR 168:751-762, 2022.

Abstract

We present a computational framework for synthesis of distributed control strategies for a heterogeneous team of robots in a partially observable environment. The goal is to cooperatively satisfy specifications given as Truncated Linear Temporal Logic (TLTL) formulas. Our approach formulates the synthesis problem as a stochastic game and employs a policy graph method to find a control strategy with memory for each agent. We construct the stochastic game on the product between the team transition system and a finite state automaton (FSA) that tracks the satisfaction of the TLTL formula. We use the quantitative semantics of TLTL as the reward of the game, and further reshape it using the FSA to guide and accelerate the learning process. Simulation results demonstrate the efficacy of the proposed solution under demanding task specifications and the effectiveness of reward shaping in significantly accelerating the speed of learning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v168-zhang22b, title = {Distributed Control using Reinforcement Learning with Temporal-Logic-Based Reward Shaping}, author = {Zhang, Ningyuan and Liu, Wenliang and Belta, Calin}, booktitle = {Proceedings of The 4th Annual Learning for Dynamics and Control Conference}, pages = {751--762}, year = {2022}, editor = {Firoozi, Roya and Mehr, Negar and Yel, Esen and Antonova, Rika and Bohg, Jeannette and Schwager, Mac and Kochenderfer, Mykel}, volume = {168}, series = {Proceedings of Machine Learning Research}, month = {23--24 Jun}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v168/zhang22b/zhang22b.pdf}, url = {https://proceedings.mlr.press/v168/zhang22b.html}, abstract = {We present a computational framework for synthesis of distributed control strategies for a heterogeneous team of robots in a partially observable environment. The goal is to cooperatively satisfy specifications given as Truncated Linear Temporal Logic (TLTL) formulas. Our approach formulates the synthesis problem as a stochastic game and employs a policy graph method to find a control strategy with memory for each agent. We construct the stochastic game on the product between the team transition system and a finite state automaton (FSA) that tracks the satisfaction of the TLTL formula. We use the quantitative semantics of TLTL as the reward of the game, and further reshape it using the FSA to guide and accelerate the learning process. Simulation results demonstrate the efficacy of the proposed solution under demanding task specifications and the effectiveness of reward shaping in significantly accelerating the speed of learning.} }
Endnote
%0 Conference Paper %T Distributed Control using Reinforcement Learning with Temporal-Logic-Based Reward Shaping %A Ningyuan Zhang %A Wenliang Liu %A Calin Belta %B Proceedings of The 4th Annual Learning for Dynamics and Control Conference %C Proceedings of Machine Learning Research %D 2022 %E Roya Firoozi %E Negar Mehr %E Esen Yel %E Rika Antonova %E Jeannette Bohg %E Mac Schwager %E Mykel Kochenderfer %F pmlr-v168-zhang22b %I PMLR %P 751--762 %U https://proceedings.mlr.press/v168/zhang22b.html %V 168 %X We present a computational framework for synthesis of distributed control strategies for a heterogeneous team of robots in a partially observable environment. The goal is to cooperatively satisfy specifications given as Truncated Linear Temporal Logic (TLTL) formulas. Our approach formulates the synthesis problem as a stochastic game and employs a policy graph method to find a control strategy with memory for each agent. We construct the stochastic game on the product between the team transition system and a finite state automaton (FSA) that tracks the satisfaction of the TLTL formula. We use the quantitative semantics of TLTL as the reward of the game, and further reshape it using the FSA to guide and accelerate the learning process. Simulation results demonstrate the efficacy of the proposed solution under demanding task specifications and the effectiveness of reward shaping in significantly accelerating the speed of learning.
APA
Zhang, N., Liu, W. & Belta, C.. (2022). Distributed Control using Reinforcement Learning with Temporal-Logic-Based Reward Shaping. Proceedings of The 4th Annual Learning for Dynamics and Control Conference, in Proceedings of Machine Learning Research 168:751-762 Available from https://proceedings.mlr.press/v168/zhang22b.html.

Related Material