Regret-Free Reinforcement Learning for Temporal Logic Specifications

R Majumdar, Mahmoud Salamati, Sadegh Soudjani
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:42691-42711, 2025.

Abstract

Learning to control an unknown dynamical system with respect to high-level temporal specifications is an important problem in control theory. We present the first regret-free online algorithm for learning a controller for linear temporal logic (LTL) specifications for systems with unknown dynamics. We assume that the underlying (unknown) dynamics is modeled by a finite-state and action Markov decision process (MDPs). Our core technical result is a regret-free learning algorithm for infinite-horizon reach-avoid problems on MDPs. For general LTL specifications, we show that the synthesis problem can be reduced to a reach-avoid problem once the graph structure is known. Additionally, we provide an algorithm for learning the graph structure, assuming knowledge of a minimum transition probability, which operates independently of the main regret-free algorithm. Our LTL controller synthesis algorithm provides sharp bounds on how close we are to achieving optimal behavior after a finite number of learning episodes. In contrast, previous algorithms for LTL synthesis only provide asymptotic guarantees, which give no insight into the transient performance during the learning phase.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-majumdar25a, title = {Regret-Free Reinforcement Learning for Temporal Logic Specifications}, author = {Majumdar, R and Salamati, Mahmoud and Soudjani, Sadegh}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {42691--42711}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/majumdar25a/majumdar25a.pdf}, url = {https://proceedings.mlr.press/v267/majumdar25a.html}, abstract = {Learning to control an unknown dynamical system with respect to high-level temporal specifications is an important problem in control theory. We present the first regret-free online algorithm for learning a controller for linear temporal logic (LTL) specifications for systems with unknown dynamics. We assume that the underlying (unknown) dynamics is modeled by a finite-state and action Markov decision process (MDPs). Our core technical result is a regret-free learning algorithm for infinite-horizon reach-avoid problems on MDPs. For general LTL specifications, we show that the synthesis problem can be reduced to a reach-avoid problem once the graph structure is known. Additionally, we provide an algorithm for learning the graph structure, assuming knowledge of a minimum transition probability, which operates independently of the main regret-free algorithm. Our LTL controller synthesis algorithm provides sharp bounds on how close we are to achieving optimal behavior after a finite number of learning episodes. In contrast, previous algorithms for LTL synthesis only provide asymptotic guarantees, which give no insight into the transient performance during the learning phase.} }
Endnote
%0 Conference Paper %T Regret-Free Reinforcement Learning for Temporal Logic Specifications %A R Majumdar %A Mahmoud Salamati %A Sadegh Soudjani %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-majumdar25a %I PMLR %P 42691--42711 %U https://proceedings.mlr.press/v267/majumdar25a.html %V 267 %X Learning to control an unknown dynamical system with respect to high-level temporal specifications is an important problem in control theory. We present the first regret-free online algorithm for learning a controller for linear temporal logic (LTL) specifications for systems with unknown dynamics. We assume that the underlying (unknown) dynamics is modeled by a finite-state and action Markov decision process (MDPs). Our core technical result is a regret-free learning algorithm for infinite-horizon reach-avoid problems on MDPs. For general LTL specifications, we show that the synthesis problem can be reduced to a reach-avoid problem once the graph structure is known. Additionally, we provide an algorithm for learning the graph structure, assuming knowledge of a minimum transition probability, which operates independently of the main regret-free algorithm. Our LTL controller synthesis algorithm provides sharp bounds on how close we are to achieving optimal behavior after a finite number of learning episodes. In contrast, previous algorithms for LTL synthesis only provide asymptotic guarantees, which give no insight into the transient performance during the learning phase.
APA
Majumdar, R., Salamati, M. & Soudjani, S.. (2025). Regret-Free Reinforcement Learning for Temporal Logic Specifications. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:42691-42711 Available from https://proceedings.mlr.press/v267/majumdar25a.html.

Related Material