[edit]
Enhancing Inverse Reinforcement Learning through Encoding Dynamic Information in Reward Shaping
Proceedings of The 8th Annual Learning for Dynamics and Control Conference, PMLR 331:2171-2206, 2026.
Abstract
Adversarial-based inverse reinforcement learning (IRL) has shown promising results using reward shaping under deterministic settings. However, it struggles in stochastic environments where existing theoretical results no longer apply, leading to degraded performance. To address this issue, we propose a novel maximum causal entropy based off-policy IRL method with transition-aware reward shaping framework. Our method integrates transition model estimation directly to learn stochastic-invariant rewards. We conduct a thorough theoretical analysis, establishing bounds on reward error and performance differences to validate the effectiveness of our method. The experimental results in continuous locomotion tasks (MuJoCo) show that our method can achieve superior performance in stochastic environments and competitive performance in deterministic environments, with significant improvement in sample efficiency, compared to existing baselines. Additionally, we extend our framework to high-dimensional vision-based tasks, where our method shows promising results on multiple stochastic Atari games. These results demonstrate that embedding transition awareness into reward learning is critical for robust IRL in realistic stochastic settings.