PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling

Utsav Singh, Wesley A Suttle, Brian M. Sadler, Vinay P. Namboodiri, Amrit Bedi
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:45681-45695, 2024.

Abstract

In this work, we introduce PIPER: Primitive-Informed Preference-based Hierarchical reinforcement learning via Hindsight Relabeling, a novel approach that leverages preference-based learning to learn a reward model, and subsequently uses this reward model to relabel higher-level replay buffers. Since this reward is unaffected by lower primitive behavior, our relabeling-based approach is able to mitigate non-stationarity, which is common in existing hierarchical approaches, and demonstrates impressive performance across a range of challenging sparse-reward tasks. Since obtaining human feedback is typically impractical, we propose to replace the human-in-the-loop approach with our primitive-in-the-loop approach, which generates feedback using sparse rewards provided by the environment. Moreover, in order to prevent infeasible subgoal prediction and avoid degenerate solutions, we propose primitive-informed regularization that conditions higher-level policies to generate feasible subgoals for lower-level policies. We perform extensive experiments to show that PIPER mitigates non-stationarity in hierarchical reinforcement learning and achieves greater than 50$\%$ success rates in challenging, sparse-reward robotic environments, where most other baselines fail to achieve any significant progress.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-singh24e, title = {{PIPER}: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling}, author = {Singh, Utsav and Suttle, Wesley A and Sadler, Brian M. and Namboodiri, Vinay P. and Bedi, Amrit}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {45681--45695}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/singh24e/singh24e.pdf}, url = {https://proceedings.mlr.press/v235/singh24e.html}, abstract = {In this work, we introduce PIPER: Primitive-Informed Preference-based Hierarchical reinforcement learning via Hindsight Relabeling, a novel approach that leverages preference-based learning to learn a reward model, and subsequently uses this reward model to relabel higher-level replay buffers. Since this reward is unaffected by lower primitive behavior, our relabeling-based approach is able to mitigate non-stationarity, which is common in existing hierarchical approaches, and demonstrates impressive performance across a range of challenging sparse-reward tasks. Since obtaining human feedback is typically impractical, we propose to replace the human-in-the-loop approach with our primitive-in-the-loop approach, which generates feedback using sparse rewards provided by the environment. Moreover, in order to prevent infeasible subgoal prediction and avoid degenerate solutions, we propose primitive-informed regularization that conditions higher-level policies to generate feasible subgoals for lower-level policies. We perform extensive experiments to show that PIPER mitigates non-stationarity in hierarchical reinforcement learning and achieves greater than 50$\%$ success rates in challenging, sparse-reward robotic environments, where most other baselines fail to achieve any significant progress.} }
Endnote
%0 Conference Paper %T PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling %A Utsav Singh %A Wesley A Suttle %A Brian M. Sadler %A Vinay P. Namboodiri %A Amrit Bedi %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-singh24e %I PMLR %P 45681--45695 %U https://proceedings.mlr.press/v235/singh24e.html %V 235 %X In this work, we introduce PIPER: Primitive-Informed Preference-based Hierarchical reinforcement learning via Hindsight Relabeling, a novel approach that leverages preference-based learning to learn a reward model, and subsequently uses this reward model to relabel higher-level replay buffers. Since this reward is unaffected by lower primitive behavior, our relabeling-based approach is able to mitigate non-stationarity, which is common in existing hierarchical approaches, and demonstrates impressive performance across a range of challenging sparse-reward tasks. Since obtaining human feedback is typically impractical, we propose to replace the human-in-the-loop approach with our primitive-in-the-loop approach, which generates feedback using sparse rewards provided by the environment. Moreover, in order to prevent infeasible subgoal prediction and avoid degenerate solutions, we propose primitive-informed regularization that conditions higher-level policies to generate feasible subgoals for lower-level policies. We perform extensive experiments to show that PIPER mitigates non-stationarity in hierarchical reinforcement learning and achieves greater than 50$\%$ success rates in challenging, sparse-reward robotic environments, where most other baselines fail to achieve any significant progress.
APA
Singh, U., Suttle, W.A., Sadler, B.M., Namboodiri, V.P. & Bedi, A.. (2024). PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:45681-45695 Available from https://proceedings.mlr.press/v235/singh24e.html.

Related Material