[edit]
Bidirectional End-to-End Framework for Transfer from Abstract Models in Non-Markovian Reinforcement Learning
Proceedings of the International Conference on Neuro-symbolic Systems, PMLR 288:643-660, 2025.
Abstract
We propose a bidirectional end-to-end reinforcement learning (RL) framework for solving complex non-Markovian tasks in discrete and continuous environments. Instead of directly learning policies in high-dimensional spaces, we first construct a simplified teacher model as a surrogate environment from offline trajectories. Simultaneously, we infer a Deterministic Finite Automaton (DFA) using the RPNI algorithm to capture task dependencies. A policy is learned in the surrogate environment and transferred to the original domain via automaton distillation, which guides policy learning more effectively than direct RL in the original environment. Our framework integrates DQN for discrete tasks and DDPG/TD3 for continuous settings. Empirical results demonstrate that this structured transfer significantly improves learning efficiency and convergence speed, outperforming standard RL baselines.