Goal-Reaching Policy Learning from Non-Expert Observations via Effective Subgoal Guidance

RenMing Huang, Shaochong Liu, Yunqiang Pei, Peng Wang, Guoqing Wang, Yang Yang, Heng Tao Shen
Proceedings of The 8th Conference on Robot Learning, PMLR 270:1744-1762, 2025.

Abstract

In this work, we address the challenging problem of long-horizon goal-reaching policy learning from non-expert, action-free observation data. Unlike fully labeled expert data, our data is more accessible and avoids the costly process of action labeling. Additionally, compared to online learning, which often involves aimless exploration, our data provides useful guidance for more efficient exploration. To achieve our goal, we propose a novel subgoal guidance learning strategy. The motivation behind this strategy is that long-horizon goals offer limited guidance for efficient exploration and accurate state transition. We develop a diffusion strategy-based high-level policy to generate reasonable subgoals as waypoints, preferring states that more easily lead to the final goal. Additionally, we learn state-goal value functions to encourage efficient subgoal reaching. These two components naturally integrate into the off-policy actor-critic framework, enabling efficient goal attainment through informative exploration. We evaluate our method on complex robotic navigation and manipulation tasks, demonstrating a significant performance advantage over existing methods. Our ablation study further shows that our method is robust to observation data with various corruptions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v270-huang25c, title = {Goal-Reaching Policy Learning from Non-Expert Observations via Effective Subgoal Guidance}, author = {Huang, RenMing and Liu, Shaochong and Pei, Yunqiang and Wang, Peng and Wang, Guoqing and Yang, Yang and Shen, Heng Tao}, booktitle = {Proceedings of The 8th Conference on Robot Learning}, pages = {1744--1762}, year = {2025}, editor = {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram}, volume = {270}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v270/main/assets/huang25c/huang25c.pdf}, url = {https://proceedings.mlr.press/v270/huang25c.html}, abstract = {In this work, we address the challenging problem of long-horizon goal-reaching policy learning from non-expert, action-free observation data. Unlike fully labeled expert data, our data is more accessible and avoids the costly process of action labeling. Additionally, compared to online learning, which often involves aimless exploration, our data provides useful guidance for more efficient exploration. To achieve our goal, we propose a novel subgoal guidance learning strategy. The motivation behind this strategy is that long-horizon goals offer limited guidance for efficient exploration and accurate state transition. We develop a diffusion strategy-based high-level policy to generate reasonable subgoals as waypoints, preferring states that more easily lead to the final goal. Additionally, we learn state-goal value functions to encourage efficient subgoal reaching. These two components naturally integrate into the off-policy actor-critic framework, enabling efficient goal attainment through informative exploration. We evaluate our method on complex robotic navigation and manipulation tasks, demonstrating a significant performance advantage over existing methods. Our ablation study further shows that our method is robust to observation data with various corruptions.} }
Endnote
%0 Conference Paper %T Goal-Reaching Policy Learning from Non-Expert Observations via Effective Subgoal Guidance %A RenMing Huang %A Shaochong Liu %A Yunqiang Pei %A Peng Wang %A Guoqing Wang %A Yang Yang %A Heng Tao Shen %B Proceedings of The 8th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Pulkit Agrawal %E Oliver Kroemer %E Wolfram Burgard %F pmlr-v270-huang25c %I PMLR %P 1744--1762 %U https://proceedings.mlr.press/v270/huang25c.html %V 270 %X In this work, we address the challenging problem of long-horizon goal-reaching policy learning from non-expert, action-free observation data. Unlike fully labeled expert data, our data is more accessible and avoids the costly process of action labeling. Additionally, compared to online learning, which often involves aimless exploration, our data provides useful guidance for more efficient exploration. To achieve our goal, we propose a novel subgoal guidance learning strategy. The motivation behind this strategy is that long-horizon goals offer limited guidance for efficient exploration and accurate state transition. We develop a diffusion strategy-based high-level policy to generate reasonable subgoals as waypoints, preferring states that more easily lead to the final goal. Additionally, we learn state-goal value functions to encourage efficient subgoal reaching. These two components naturally integrate into the off-policy actor-critic framework, enabling efficient goal attainment through informative exploration. We evaluate our method on complex robotic navigation and manipulation tasks, demonstrating a significant performance advantage over existing methods. Our ablation study further shows that our method is robust to observation data with various corruptions.
APA
Huang, R., Liu, S., Pei, Y., Wang, P., Wang, G., Yang, Y. & Shen, H.T.. (2025). Goal-Reaching Policy Learning from Non-Expert Observations via Effective Subgoal Guidance. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:1744-1762 Available from https://proceedings.mlr.press/v270/huang25c.html.

Related Material