$Door(s)$: Junction State Estimation for Efficient Exploration in Reinforcement Learning

Benjamin Fele, Jan Babic
Proceedings of The 9th Conference on Robot Learning, PMLR 305:3341-3356, 2025.

Abstract

Exploration is one of the important bottlenecks for efficient learning in reinforcement learning, especially in the presence of sparse rewards. One way to traverse the environment faster is by passing through junctions, or metaphorical doors, in the state space. We propose a novel heuristic, $Door(s)$, focused on such narrow passages that serve as pathways to a large number of other states. Our approach works by estimating the state occupancy distribution and allows computation of its entropy, which forms the basis for our measure. Its computation is more sample-efficient compared to other similar methods and robustly works over longer horizons. Our results highlight the detection of dead-end states, show increased exploration efficiency, and demonstrate that $Door(s)$ encodes specific behaviors useful for downstream learning of various robotic manipulation tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v305-fele25a, title = {$Door(s)$: Junction State Estimation for Efficient Exploration in Reinforcement Learning}, author = {Fele, Benjamin and Babic, Jan}, booktitle = {Proceedings of The 9th Conference on Robot Learning}, pages = {3341--3356}, year = {2025}, editor = {Lim, Joseph and Song, Shuran and Park, Hae-Won}, volume = {305}, series = {Proceedings of Machine Learning Research}, month = {27--30 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v305/main/assets/fele25a/fele25a.pdf}, url = {https://proceedings.mlr.press/v305/fele25a.html}, abstract = {Exploration is one of the important bottlenecks for efficient learning in reinforcement learning, especially in the presence of sparse rewards. One way to traverse the environment faster is by passing through junctions, or metaphorical doors, in the state space. We propose a novel heuristic, $Door(s)$, focused on such narrow passages that serve as pathways to a large number of other states. Our approach works by estimating the state occupancy distribution and allows computation of its entropy, which forms the basis for our measure. Its computation is more sample-efficient compared to other similar methods and robustly works over longer horizons. Our results highlight the detection of dead-end states, show increased exploration efficiency, and demonstrate that $Door(s)$ encodes specific behaviors useful for downstream learning of various robotic manipulation tasks.} }
Endnote
%0 Conference Paper %T $Door(s)$: Junction State Estimation for Efficient Exploration in Reinforcement Learning %A Benjamin Fele %A Jan Babic %B Proceedings of The 9th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Joseph Lim %E Shuran Song %E Hae-Won Park %F pmlr-v305-fele25a %I PMLR %P 3341--3356 %U https://proceedings.mlr.press/v305/fele25a.html %V 305 %X Exploration is one of the important bottlenecks for efficient learning in reinforcement learning, especially in the presence of sparse rewards. One way to traverse the environment faster is by passing through junctions, or metaphorical doors, in the state space. We propose a novel heuristic, $Door(s)$, focused on such narrow passages that serve as pathways to a large number of other states. Our approach works by estimating the state occupancy distribution and allows computation of its entropy, which forms the basis for our measure. Its computation is more sample-efficient compared to other similar methods and robustly works over longer horizons. Our results highlight the detection of dead-end states, show increased exploration efficiency, and demonstrate that $Door(s)$ encodes specific behaviors useful for downstream learning of various robotic manipulation tasks.
APA
Fele, B. & Babic, J.. (2025). $Door(s)$: Junction State Estimation for Efficient Exploration in Reinforcement Learning. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:3341-3356 Available from https://proceedings.mlr.press/v305/fele25a.html.

Related Material