[edit]
Provably Correct Automata Embeddings for Optimal Automata-Conditioned Reinforcement Learning
Proceedings of the International Conference on Neuro-symbolic Systems, PMLR 288:661-675, 2025.
Abstract
Automata-conditioned reinforcement learning (RL) has given promising results for learning multi-task policies capable of performing temporally extended objectives given at runtime, done by pretraining and freezing automata embeddings prior to training the downstream policy. However, no theoretical guarantees were given. This work provides a theoretical framework for the automata-conditioned RL problem and shows that it is probably approximately correct (PAC) learnable. We then present a technique for learning provably correct automata embeddings, guaranteeing optimal multi-task policy learning. Our experimental evaluation confirms these theoretical results.