Reward Translation via Reward Machine in Semi-Alignable MDPs

Yun Hua, Haosheng Chen, Wenhao Li, Bo Jin, Baoxiang Wang, Hongyuan Zha, Xiangfeng Wang
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:24912-24931, 2025.

Abstract

Addressing reward design complexities in deep reinforcement learning is facilitated by knowledge transfer across different domains. To this end, we define reward translation to describe the cross-domain reward transfer problem. However, current methods struggle with non-pairable and non-time-alignable incompatible MDPs. This paper presents an adaptable reward translation framework neural reward translation featuring semi-alignable MDPs, which allows efficient reward translation under relaxed constraints while handling the intricacies of incompatible MDPs. Given the inherent difficulty of directly mapping semi-alignable MDPs and transferring rewards, we introduce an indirect mapping method through reward machines, created using limited human input or LLM-based automated learning. Graph-matching techniques establish links between reward machines from distinct environments, thus enabling cross-domain reward translation within semi-alignable MDP settings. This broadens the applicability of DRL across multiple domains. Experiments substantiate our approach’s effectiveness in tasks under environments with semi-alignable MDPs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-hua25a, title = {Reward Translation via Reward Machine in Semi-Alignable {MDP}s}, author = {Hua, Yun and Chen, Haosheng and Li, Wenhao and Jin, Bo and Wang, Baoxiang and Zha, Hongyuan and Wang, Xiangfeng}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {24912--24931}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/hua25a/hua25a.pdf}, url = {https://proceedings.mlr.press/v267/hua25a.html}, abstract = {Addressing reward design complexities in deep reinforcement learning is facilitated by knowledge transfer across different domains. To this end, we define reward translation to describe the cross-domain reward transfer problem. However, current methods struggle with non-pairable and non-time-alignable incompatible MDPs. This paper presents an adaptable reward translation framework neural reward translation featuring semi-alignable MDPs, which allows efficient reward translation under relaxed constraints while handling the intricacies of incompatible MDPs. Given the inherent difficulty of directly mapping semi-alignable MDPs and transferring rewards, we introduce an indirect mapping method through reward machines, created using limited human input or LLM-based automated learning. Graph-matching techniques establish links between reward machines from distinct environments, thus enabling cross-domain reward translation within semi-alignable MDP settings. This broadens the applicability of DRL across multiple domains. Experiments substantiate our approach’s effectiveness in tasks under environments with semi-alignable MDPs.} }
Endnote
%0 Conference Paper %T Reward Translation via Reward Machine in Semi-Alignable MDPs %A Yun Hua %A Haosheng Chen %A Wenhao Li %A Bo Jin %A Baoxiang Wang %A Hongyuan Zha %A Xiangfeng Wang %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-hua25a %I PMLR %P 24912--24931 %U https://proceedings.mlr.press/v267/hua25a.html %V 267 %X Addressing reward design complexities in deep reinforcement learning is facilitated by knowledge transfer across different domains. To this end, we define reward translation to describe the cross-domain reward transfer problem. However, current methods struggle with non-pairable and non-time-alignable incompatible MDPs. This paper presents an adaptable reward translation framework neural reward translation featuring semi-alignable MDPs, which allows efficient reward translation under relaxed constraints while handling the intricacies of incompatible MDPs. Given the inherent difficulty of directly mapping semi-alignable MDPs and transferring rewards, we introduce an indirect mapping method through reward machines, created using limited human input or LLM-based automated learning. Graph-matching techniques establish links between reward machines from distinct environments, thus enabling cross-domain reward translation within semi-alignable MDP settings. This broadens the applicability of DRL across multiple domains. Experiments substantiate our approach’s effectiveness in tasks under environments with semi-alignable MDPs.
APA
Hua, Y., Chen, H., Li, W., Jin, B., Wang, B., Zha, H. & Wang, X.. (2025). Reward Translation via Reward Machine in Semi-Alignable MDPs. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:24912-24931 Available from https://proceedings.mlr.press/v267/hua25a.html.

Related Material