Temporal Distance-aware Transition Augmentation for Offline Model-based Reinforcement Learning

Dongsu Lee, Minhae Kwon
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:33227-33242, 2025.

Abstract

The goal of offline reinforcement learning (RL) is to extract the best possible policy from the previously collected dataset considering the out-of-distribution (OOD) sample issue. Offline model-based RL (MBRL) is a captivating solution capable of alleviating such issues through a state-action transition augmentation with a learned dynamic model. Unfortunately, offline MBRL methods have been observed to fail in sparse rewarded and long-horizon environments for a long time. In this work, we propose a novel MBRL method, dubbed Temporal Distance-Aware Transition Augmentation (TempDATA), that generates additional transitions in a geometrically structured representation space, instead of state space. For comprehending long-horizon behaviors efficiently, our main idea is to learn state abstraction, which captures a temporal distance from both trajectory and transition levels of state space. Our experiments empirically confirm that TempDATA outperforms previous offline MBRL methods and achieves matching or surpassing the performance of diffusion-based trajectory augmentation and goal-conditioned RL on the D4RL AntMaze, FrankaKitchen, CALVIN, and pixel-based FrankaKitchen.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-lee25p, title = {Temporal Distance-aware Transition Augmentation for Offline Model-based Reinforcement Learning}, author = {Lee, Dongsu and Kwon, Minhae}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {33227--33242}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/lee25p/lee25p.pdf}, url = {https://proceedings.mlr.press/v267/lee25p.html}, abstract = {The goal of offline reinforcement learning (RL) is to extract the best possible policy from the previously collected dataset considering the out-of-distribution (OOD) sample issue. Offline model-based RL (MBRL) is a captivating solution capable of alleviating such issues through a state-action transition augmentation with a learned dynamic model. Unfortunately, offline MBRL methods have been observed to fail in sparse rewarded and long-horizon environments for a long time. In this work, we propose a novel MBRL method, dubbed Temporal Distance-Aware Transition Augmentation (TempDATA), that generates additional transitions in a geometrically structured representation space, instead of state space. For comprehending long-horizon behaviors efficiently, our main idea is to learn state abstraction, which captures a temporal distance from both trajectory and transition levels of state space. Our experiments empirically confirm that TempDATA outperforms previous offline MBRL methods and achieves matching or surpassing the performance of diffusion-based trajectory augmentation and goal-conditioned RL on the D4RL AntMaze, FrankaKitchen, CALVIN, and pixel-based FrankaKitchen.} }
Endnote
%0 Conference Paper %T Temporal Distance-aware Transition Augmentation for Offline Model-based Reinforcement Learning %A Dongsu Lee %A Minhae Kwon %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-lee25p %I PMLR %P 33227--33242 %U https://proceedings.mlr.press/v267/lee25p.html %V 267 %X The goal of offline reinforcement learning (RL) is to extract the best possible policy from the previously collected dataset considering the out-of-distribution (OOD) sample issue. Offline model-based RL (MBRL) is a captivating solution capable of alleviating such issues through a state-action transition augmentation with a learned dynamic model. Unfortunately, offline MBRL methods have been observed to fail in sparse rewarded and long-horizon environments for a long time. In this work, we propose a novel MBRL method, dubbed Temporal Distance-Aware Transition Augmentation (TempDATA), that generates additional transitions in a geometrically structured representation space, instead of state space. For comprehending long-horizon behaviors efficiently, our main idea is to learn state abstraction, which captures a temporal distance from both trajectory and transition levels of state space. Our experiments empirically confirm that TempDATA outperforms previous offline MBRL methods and achieves matching or surpassing the performance of diffusion-based trajectory augmentation and goal-conditioned RL on the D4RL AntMaze, FrankaKitchen, CALVIN, and pixel-based FrankaKitchen.
APA
Lee, D. & Kwon, M.. (2025). Temporal Distance-aware Transition Augmentation for Offline Model-based Reinforcement Learning. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:33227-33242 Available from https://proceedings.mlr.press/v267/lee25p.html.

Related Material