TLDR: Unsupervised Goal-Conditioned RL via Temporal Distance-Aware Representations

Junik Bae, Kwanyoung Park, Youngwoon Lee
Proceedings of The 8th Conference on Robot Learning, PMLR 270:2183-2204, 2025.

Abstract

Unsupervised goal-conditioned reinforcement learning (GCRL) is a promising paradigm for developing diverse robotic skills without external supervision. However, existing unsupervised GCRL methods often struggle to cover a wide range of states in complex environments due to their limited exploration and sparse or noisy rewards for GCRL. To overcome these challenges, we propose a novel unsupervised GCRL method that leverages TemporaL Distance-aware Representations (TLDR). Based on temporal distance, TLDR selects faraway goals to initiate exploration and computes intrinsic exploration rewards and goal-reaching rewards. Specifically, our exploration policy seeks states with large temporal distances (i.e. covering a large state space), while the goal-conditioned policy learns to minimize the temporal distance to the goal (i.e. reaching the goal). Our results in six simulated locomotion environments demonstrate that TLDR significantly outperforms prior unsupervised GCRL methods in achieving a wide range of states.

Cite this Paper


BibTeX
@InProceedings{pmlr-v270-bae25a, title = {TLDR: Unsupervised Goal-Conditioned RL via Temporal Distance-Aware Representations}, author = {Bae, Junik and Park, Kwanyoung and Lee, Youngwoon}, booktitle = {Proceedings of The 8th Conference on Robot Learning}, pages = {2183--2204}, year = {2025}, editor = {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram}, volume = {270}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v270/main/assets/bae25a/bae25a.pdf}, url = {https://proceedings.mlr.press/v270/bae25a.html}, abstract = {Unsupervised goal-conditioned reinforcement learning (GCRL) is a promising paradigm for developing diverse robotic skills without external supervision. However, existing unsupervised GCRL methods often struggle to cover a wide range of states in complex environments due to their limited exploration and sparse or noisy rewards for GCRL. To overcome these challenges, we propose a novel unsupervised GCRL method that leverages TemporaL Distance-aware Representations (TLDR). Based on temporal distance, TLDR selects faraway goals to initiate exploration and computes intrinsic exploration rewards and goal-reaching rewards. Specifically, our exploration policy seeks states with large temporal distances (i.e. covering a large state space), while the goal-conditioned policy learns to minimize the temporal distance to the goal (i.e. reaching the goal). Our results in six simulated locomotion environments demonstrate that TLDR significantly outperforms prior unsupervised GCRL methods in achieving a wide range of states.} }
Endnote
%0 Conference Paper %T TLDR: Unsupervised Goal-Conditioned RL via Temporal Distance-Aware Representations %A Junik Bae %A Kwanyoung Park %A Youngwoon Lee %B Proceedings of The 8th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Pulkit Agrawal %E Oliver Kroemer %E Wolfram Burgard %F pmlr-v270-bae25a %I PMLR %P 2183--2204 %U https://proceedings.mlr.press/v270/bae25a.html %V 270 %X Unsupervised goal-conditioned reinforcement learning (GCRL) is a promising paradigm for developing diverse robotic skills without external supervision. However, existing unsupervised GCRL methods often struggle to cover a wide range of states in complex environments due to their limited exploration and sparse or noisy rewards for GCRL. To overcome these challenges, we propose a novel unsupervised GCRL method that leverages TemporaL Distance-aware Representations (TLDR). Based on temporal distance, TLDR selects faraway goals to initiate exploration and computes intrinsic exploration rewards and goal-reaching rewards. Specifically, our exploration policy seeks states with large temporal distances (i.e. covering a large state space), while the goal-conditioned policy learns to minimize the temporal distance to the goal (i.e. reaching the goal). Our results in six simulated locomotion environments demonstrate that TLDR significantly outperforms prior unsupervised GCRL methods in achieving a wide range of states.
APA
Bae, J., Park, K. & Lee, Y.. (2025). TLDR: Unsupervised Goal-Conditioned RL via Temporal Distance-Aware Representations. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:2183-2204 Available from https://proceedings.mlr.press/v270/bae25a.html.

Related Material