[edit]
Temporal Distance-aware Transition Augmentation for Offline Model-based Reinforcement Learning
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:33227-33242, 2025.
Abstract
The goal of offline reinforcement learning (RL) is to extract the best possible policy from the previously collected dataset considering the out-of-distribution (OOD) sample issue. Offline model-based RL (MBRL) is a captivating solution capable of alleviating such issues through a state-action transition augmentation with a learned dynamic model. Unfortunately, offline MBRL methods have been observed to fail in sparse rewarded and long-horizon environments for a long time. In this work, we propose a novel MBRL method, dubbed Temporal Distance-Aware Transition Augmentation (TempDATA), that generates additional transitions in a geometrically structured representation space, instead of state space. For comprehending long-horizon behaviors efficiently, our main idea is to learn state abstraction, which captures a temporal distance from both trajectory and transition levels of state space. Our experiments empirically confirm that TempDATA outperforms previous offline MBRL methods and achieves matching or surpassing the performance of diffusion-based trajectory augmentation and goal-conditioned RL on the D4RL AntMaze, FrankaKitchen, CALVIN, and pixel-based FrankaKitchen.