Bidirectional End-to-End Framework for Transfer from Abstract Models in Non-Markovian Reinforcement Learning

Mahyar Alinejad; Precious Nwaorgu; Chinwendu Enyioha; Yue Wang; Alvaro Velasquez; George Atia

Bidirectional End-to-End Framework for Transfer from Abstract Models in Non-Markovian Reinforcement Learning

Mahyar Alinejad, Precious Nwaorgu, Chinwendu Enyioha, Yue Wang, Alvaro Velasquez, George Atia

Proceedings of the International Conference on Neuro-symbolic Systems, PMLR 288:643-660, 2025.

Abstract

We propose a bidirectional end-to-end reinforcement learning (RL) framework for solving complex non-Markovian tasks in discrete and continuous environments. Instead of directly learning policies in high-dimensional spaces, we first construct a simplified teacher model as a surrogate environment from offline trajectories. Simultaneously, we infer a Deterministic Finite Automaton (DFA) using the RPNI algorithm to capture task dependencies. A policy is learned in the surrogate environment and transferred to the original domain via automaton distillation, which guides policy learning more effectively than direct RL in the original environment. Our framework integrates DQN for discrete tasks and DDPG/TD3 for continuous settings. Empirical results demonstrate that this structured transfer significantly improves learning efficiency and convergence speed, outperforming standard RL baselines.

Cite this Paper

BibTeX

@InProceedings{pmlr-v288-alinejad25a,
  title = 	 {Bidirectional End-to-End Framework for Transfer from Abstract Models in Non-Markovian Reinforcement Learning},
  author =       {Alinejad, Mahyar and Nwaorgu, Precious and Enyioha, Chinwendu and Wang, Yue and Velasquez, Alvaro and Atia, George},
  booktitle = 	 {Proceedings of the International Conference on Neuro-symbolic Systems},
  pages = 	 {643--660},
  year = 	 {2025},
  editor = 	 {Pappas, George and Ravikumar, Pradeep and Seshia, Sanjit A.},
  volume = 	 {288},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {28--30 May},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v288/main/assets/alinejad25a/alinejad25a.pdf},
  url = 	 {https://proceedings.mlr.press/v288/alinejad25a.html},
  abstract = 	 {We propose a bidirectional end-to-end reinforcement learning (RL) framework for solving complex non-Markovian tasks in discrete and continuous environments. Instead of directly learning policies in high-dimensional spaces, we first construct a simplified teacher model as a surrogate environment from offline trajectories. Simultaneously, we infer a Deterministic Finite Automaton (DFA) using the RPNI algorithm to capture task dependencies. A policy is learned in the surrogate environment and transferred to the original domain via automaton distillation, which guides policy learning more effectively than direct RL in the original environment. Our framework integrates DQN for discrete tasks and DDPG/TD3 for continuous settings. Empirical results demonstrate that this structured transfer significantly improves learning efficiency and convergence speed, outperforming standard RL baselines.}
}

Endnote

%0 Conference Paper
%T Bidirectional End-to-End Framework for Transfer from Abstract Models in Non-Markovian Reinforcement Learning
%A Mahyar Alinejad
%A Precious Nwaorgu
%A Chinwendu Enyioha
%A Yue Wang
%A Alvaro Velasquez
%A George Atia
%B Proceedings of the International Conference on Neuro-symbolic Systems
%C Proceedings of Machine Learning Research
%D 2025
%E George Pappas
%E Pradeep Ravikumar
%E Sanjit A. Seshia	
%F pmlr-v288-alinejad25a
%I PMLR
%P 643--660
%U https://proceedings.mlr.press/v288/alinejad25a.html
%V 288
%X We propose a bidirectional end-to-end reinforcement learning (RL) framework for solving complex non-Markovian tasks in discrete and continuous environments. Instead of directly learning policies in high-dimensional spaces, we first construct a simplified teacher model as a surrogate environment from offline trajectories. Simultaneously, we infer a Deterministic Finite Automaton (DFA) using the RPNI algorithm to capture task dependencies. A policy is learned in the surrogate environment and transferred to the original domain via automaton distillation, which guides policy learning more effectively than direct RL in the original environment. Our framework integrates DQN for discrete tasks and DDPG/TD3 for continuous settings. Empirical results demonstrate that this structured transfer significantly improves learning efficiency and convergence speed, outperforming standard RL baselines.

APA

Alinejad, M., Nwaorgu, P., Enyioha, C., Wang, Y., Velasquez, A. & Atia, G.. (2025). Bidirectional End-to-End Framework for Transfer from Abstract Models in Non-Markovian Reinforcement Learning. Proceedings of the International Conference on Neuro-symbolic Systems, in Proceedings of Machine Learning Research 288:643-660 Available from https://proceedings.mlr.press/v288/alinejad25a.html.

Related Material

Download PDF