Reward Shaping for Reinforcement Learning with An Assistant Reward Agent

Haozhe Ma; Kuankuan Sima; Thanh Vinh Vo; Di Fu; Tze-Yun Leong

Reward Shaping for Reinforcement Learning with An Assistant Reward Agent

Haozhe Ma, Kuankuan Sima, Thanh Vinh Vo, Di Fu, Tze-Yun Leong

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:33925-33939, 2024.

Abstract

Reward shaping is a promising approach to tackle the sparse-reward challenge of reinforcement learning by reconstructing more informative and dense rewards. This paper introduces a novel dual-agent reward shaping framework, composed of two synergistic agents: a policy agent to learn the optimal behavior and a reward agent to generate auxiliary reward signals. The proposed method operates as a self-learning approach, without reliance on expert knowledge or hand-crafted functions. By restructuring the rewards to capture future-oriented information, our framework effectively enhances the sample efficiency and convergence stability. Furthermore, the auxiliary reward signals facilitate the exploration of the environment in the early stage and the exploitation of the policy agent in the late stage, achieving a self-adaptive balance. We evaluate our framework on continuous control tasks with sparse and delayed rewards, demonstrating its robustness and superiority over existing methods.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-ma24l,
  title = 	 {Reward Shaping for Reinforcement Learning with An Assistant Reward Agent},
  author =       {Ma, Haozhe and Sima, Kuankuan and Vo, Thanh Vinh and Fu, Di and Leong, Tze-Yun},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {33925--33939},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/ma24l/ma24l.pdf},
  url = 	 {https://proceedings.mlr.press/v235/ma24l.html},
  abstract = 	 {Reward shaping is a promising approach to tackle the sparse-reward challenge of reinforcement learning by reconstructing more informative and dense rewards. This paper introduces a novel dual-agent reward shaping framework, composed of two synergistic agents: a policy agent to learn the optimal behavior and a reward agent to generate auxiliary reward signals. The proposed method operates as a self-learning approach, without reliance on expert knowledge or hand-crafted functions. By restructuring the rewards to capture future-oriented information, our framework effectively enhances the sample efficiency and convergence stability. Furthermore, the auxiliary reward signals facilitate the exploration of the environment in the early stage and the exploitation of the policy agent in the late stage, achieving a self-adaptive balance. We evaluate our framework on continuous control tasks with sparse and delayed rewards, demonstrating its robustness and superiority over existing methods.}
}

Endnote

%0 Conference Paper
%T Reward Shaping for Reinforcement Learning with An Assistant Reward Agent
%A Haozhe Ma
%A Kuankuan Sima
%A Thanh Vinh Vo
%A Di Fu
%A Tze-Yun Leong
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-ma24l
%I PMLR
%P 33925--33939
%U https://proceedings.mlr.press/v235/ma24l.html
%V 235
%X Reward shaping is a promising approach to tackle the sparse-reward challenge of reinforcement learning by reconstructing more informative and dense rewards. This paper introduces a novel dual-agent reward shaping framework, composed of two synergistic agents: a policy agent to learn the optimal behavior and a reward agent to generate auxiliary reward signals. The proposed method operates as a self-learning approach, without reliance on expert knowledge or hand-crafted functions. By restructuring the rewards to capture future-oriented information, our framework effectively enhances the sample efficiency and convergence stability. Furthermore, the auxiliary reward signals facilitate the exploration of the environment in the early stage and the exploitation of the policy agent in the late stage, achieving a self-adaptive balance. We evaluate our framework on continuous control tasks with sparse and delayed rewards, demonstrating its robustness and superiority over existing methods.

APA


Ma, H., Sima, K., Vo, T.V., Fu, D. & Leong, T.. (2024). Reward Shaping for Reinforcement Learning with An Assistant Reward Agent. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:33925-33939 Available from https://proceedings.mlr.press/v235/ma24l.html.

Reward Shaping for Reinforcement Learning with An Assistant Reward Agent

Abstract

Cite this Paper

Related Material