Efficient Exploitation of Hierarchical Structure in Sparse Reward Reinforcement Learning

Gianluca Drappo, Arnaud Robert, Marcello Restelli, Aldo A. Faisal, Alberto Maria Metelli, Ciara Pike-Burke
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:2269-2277, 2025.

Abstract

We study goal-conditioned Hierarchical Reinforcement Learning (HRL), where a high-level agent instructs sub-goals to a low-level agent. Under the assumption of a sparse reward function and known hierarchical decomposition, we propose a new algorithm to learn optimal hierarchical policies. Our algorithm takes a low-level policy as input and is flexible enough to work with a wide range of low-level policies. We show that when the algorithm that computes the low-level policy is optimistic and provably efficient, our HRL algorithm enjoys a regret bound which represents a significant improvement compared to previous results for HRL. Importantly, our regret upper bound highlights key characteristics of the hierarchical decomposition that guarantee that our hierarchical algorithm is more efficient than the best monolithic approach. We support our theoretical findings with experiments that underscore that our method consistently outperforms algorithms that ignore the hierarchical structure.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-drappo25a, title = {Efficient Exploitation of Hierarchical Structure in Sparse Reward Reinforcement Learning}, author = {Drappo, Gianluca and Robert, Arnaud and Restelli, Marcello and Faisal, Aldo A. and Metelli, Alberto Maria and Pike-Burke, Ciara}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {2269--2277}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/drappo25a/drappo25a.pdf}, url = {https://proceedings.mlr.press/v258/drappo25a.html}, abstract = {We study goal-conditioned Hierarchical Reinforcement Learning (HRL), where a high-level agent instructs sub-goals to a low-level agent. Under the assumption of a sparse reward function and known hierarchical decomposition, we propose a new algorithm to learn optimal hierarchical policies. Our algorithm takes a low-level policy as input and is flexible enough to work with a wide range of low-level policies. We show that when the algorithm that computes the low-level policy is optimistic and provably efficient, our HRL algorithm enjoys a regret bound which represents a significant improvement compared to previous results for HRL. Importantly, our regret upper bound highlights key characteristics of the hierarchical decomposition that guarantee that our hierarchical algorithm is more efficient than the best monolithic approach. We support our theoretical findings with experiments that underscore that our method consistently outperforms algorithms that ignore the hierarchical structure.} }
Endnote
%0 Conference Paper %T Efficient Exploitation of Hierarchical Structure in Sparse Reward Reinforcement Learning %A Gianluca Drappo %A Arnaud Robert %A Marcello Restelli %A Aldo A. Faisal %A Alberto Maria Metelli %A Ciara Pike-Burke %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-drappo25a %I PMLR %P 2269--2277 %U https://proceedings.mlr.press/v258/drappo25a.html %V 258 %X We study goal-conditioned Hierarchical Reinforcement Learning (HRL), where a high-level agent instructs sub-goals to a low-level agent. Under the assumption of a sparse reward function and known hierarchical decomposition, we propose a new algorithm to learn optimal hierarchical policies. Our algorithm takes a low-level policy as input and is flexible enough to work with a wide range of low-level policies. We show that when the algorithm that computes the low-level policy is optimistic and provably efficient, our HRL algorithm enjoys a regret bound which represents a significant improvement compared to previous results for HRL. Importantly, our regret upper bound highlights key characteristics of the hierarchical decomposition that guarantee that our hierarchical algorithm is more efficient than the best monolithic approach. We support our theoretical findings with experiments that underscore that our method consistently outperforms algorithms that ignore the hierarchical structure.
APA
Drappo, G., Robert, A., Restelli, M., Faisal, A.A., Metelli, A.M. & Pike-Burke, C.. (2025). Efficient Exploitation of Hierarchical Structure in Sparse Reward Reinforcement Learning. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:2269-2277 Available from https://proceedings.mlr.press/v258/drappo25a.html.

Related Material