To the Max: Reinventing Reward in Reinforcement Learning

Grigorii Veviurko; Wendelin Boehmer; Mathijs De Weerdt

To the Max: Reinventing Reward in Reinforcement Learning

Grigorii Veviurko, Wendelin Boehmer, Mathijs De Weerdt

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:49455-49470, 2024.

Abstract

In reinforcement learning (RL), different reward functions can define the same optimal policy but result in drastically different learning performance. For some, the agent gets stuck with a suboptimal behavior, and for others, it solves the task efficiently. Choosing a good reward function is hence an extremely important yet challenging problem. In this paper, we explore an alternative approach for using rewards for learning. We introduce max-reward RL, where an agent optimizes the maximum rather than the cumulative reward. Unlike earlier works, our approach works for deterministic and stochastic environments and can be easily combined with state-of-the-art RL algorithms. In the experiments, we study the performance of max-reward RL algorithms in two goal-reaching environments from Gymnasium-Robotics and demonstrate its benefits over standard RL. The code is available at https://github.com/veviurko/To-the-Max.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-veviurko24a,
  title = 	 {To the Max: Reinventing Reward in Reinforcement Learning},
  author =       {Veviurko, Grigorii and Boehmer, Wendelin and Weerdt, Mathijs De},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {49455--49470},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/veviurko24a/veviurko24a.pdf},
  url = 	 {https://proceedings.mlr.press/v235/veviurko24a.html},
  abstract = 	 {In reinforcement learning (RL), different reward functions can define the same optimal policy but result in drastically different learning performance. For some, the agent gets stuck with a suboptimal behavior, and for others, it solves the task efficiently. Choosing a good reward function is hence an extremely important yet challenging problem. In this paper, we explore an alternative approach for using rewards for learning. We introduce max-reward RL, where an agent optimizes the maximum rather than the cumulative reward. Unlike earlier works, our approach works for deterministic and stochastic environments and can be easily combined with state-of-the-art RL algorithms. In the experiments, we study the performance of max-reward RL algorithms in two goal-reaching environments from Gymnasium-Robotics and demonstrate its benefits over standard RL. The code is available at https://github.com/veviurko/To-the-Max.}
}

Endnote

%0 Conference Paper
%T To the Max: Reinventing Reward in Reinforcement Learning
%A Grigorii Veviurko
%A Wendelin Boehmer
%A Mathijs De Weerdt
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-veviurko24a
%I PMLR
%P 49455--49470
%U https://proceedings.mlr.press/v235/veviurko24a.html
%V 235
%X In reinforcement learning (RL), different reward functions can define the same optimal policy but result in drastically different learning performance. For some, the agent gets stuck with a suboptimal behavior, and for others, it solves the task efficiently. Choosing a good reward function is hence an extremely important yet challenging problem. In this paper, we explore an alternative approach for using rewards for learning. We introduce max-reward RL, where an agent optimizes the maximum rather than the cumulative reward. Unlike earlier works, our approach works for deterministic and stochastic environments and can be easily combined with state-of-the-art RL algorithms. In the experiments, we study the performance of max-reward RL algorithms in two goal-reaching environments from Gymnasium-Robotics and demonstrate its benefits over standard RL. The code is available at https://github.com/veviurko/To-the-Max.

APA


Veviurko, G., Boehmer, W. & Weerdt, M.D.. (2024). To the Max: Reinventing Reward in Reinforcement Learning. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:49455-49470 Available from https://proceedings.mlr.press/v235/veviurko24a.html.

To the Max: Reinventing Reward in Reinforcement Learning

Abstract

Cite this Paper

Related Material