[edit]
Reinforcement Learning with Adaptive Reward Modeling for Expensive-to-Evaluate Systems
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:57131-57143, 2025.
Abstract
Training reinforcement learning (RL) agents requires extensive trials and errors, which becomes prohibitively time-consuming in systems with costly reward evaluations. To address this challenge, we propose adaptive reward modeling (AdaReMo) which accelerates RL training by decomposing the complicated reward function into multiple localized fast reward models approximating direct reward evaluation with neural networks. These models dynamically adapt to the agent’s evolving policy by fitting the currently explored subspace with the latest trajectories, ensuring accurate reward estimation throughout the entire training process while significantly reducing computational overhead. We empirically show that AdaReMo not only achieves over 1,000 times speedup but also improves the performance by 14.6% over state-of-the-art approaches across three expensive-to-evaluate systems—molecular generation, epidemic control, and spatial planning. Code and data for the project are provided at https://github.com/tsinghua-fib-lab/AdaReMo.