[edit]
Beyond Information Sufficiency: Observation-Action Space Alignment in Robotic Reinforcement Learning
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:1052-1059, 2026.
Abstract
Observation design is a fundamental yet under-specified component of robotic reinforcement learning (RL). While classical theory emphasizes that observations should be informationally sufficient, we show—through a focused reaching case study—that sufficiency alone does not guarantee learnability or sim-to-real transfer. Using PPO on a 6-DOF Kinova Gen3 Lite arm, we demonstrate that two observation spaces with equal dimension-ality and theoretically equivalent information content (9D joint-based vs. 9D Cartesian- based) differ by over 60 percentage points in success when paired with Cartesian velocity control. Aligned Cartesian observations consistently learn faster, achieve higher success, and transfer zero-shot to the physical robot, whereas misaligned joint observations fail despite being sufficient in principle. Our findings highlight representational alignment between observations, actions, and rewards as a first-order design constraint in robotic RL, demonstrated through controlled simulation and zero-shot real-world deployment.