Efficient Robotic Policy Learning via Latent Space Backward Planning

Dongxiu Liu; Haoyi Niu; Zhihao Wang; Jinliang Zheng; Yinan Zheng; Zhonghong Ou; Jianming Hu; Jianxiong Li; Xianyuan Zhan

Efficient Robotic Policy Learning via Latent Space Backward Planning

Dongxiu Liu, Haoyi Niu, Zhihao Wang, Jinliang Zheng, Yinan Zheng, Zhonghong Ou, Jianming Hu, Jianxiong Li, Xianyuan Zhan

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:39147-39161, 2025.

Abstract

Current robotic planning methods often rely on predicting multi-frame images with full pixel details. While this fine-grained approach can serve as a generic world model, it introduces two significant challenges for downstream policy learning: substantial computational costs that hinder real-time deployment, and accumulated inaccuracies that can mislead action extraction. Planning with coarse-grained subgoals partially alleviates efficiency issues. However, their forward planning schemes can still result in off-task predictions due to accumulation errors, leading to misalignment with long-term goals. This raises a critical question: Can robotic planning be both efficient and accurate enough for real-time control in long-horizon, multi-stage tasks? To address this, we propose a Backward Planning scheme in Latent space (LBP), which begins by grounding the task into final latent goals, followed by recursively predicting intermediate subgoals closer to the current state. The grounded final goal enables backward subgoal planning to always remain aware of task completion, facilitating on-task prediction along the entire planning horizon. The subgoal-conditioned policy incorporates a learnable token to summarize the subgoal sequences and determines how each subgoal guides action extraction. Through extensive simulation and real-robot long-horizon experiments, we show that LBP outperforms existing fine-grained and forward planning methods, achieving SOTA performance. Project Page: https://lbp-authors.github.io.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-liu25aw,
  title = 	 {Efficient Robotic Policy Learning via Latent Space Backward Planning},
  author =       {Liu, Dongxiu and Niu, Haoyi and Wang, Zhihao and Zheng, Jinliang and Zheng, Yinan and Ou, Zhonghong and Hu, Jianming and Li, Jianxiong and Zhan, Xianyuan},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {39147--39161},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/liu25aw/liu25aw.pdf},
  url = 	 {https://proceedings.mlr.press/v267/liu25aw.html},
  abstract = 	 {Current robotic planning methods often rely on predicting multi-frame images with full pixel details. While this fine-grained approach can serve as a generic world model, it introduces two significant challenges for downstream policy learning: substantial computational costs that hinder real-time deployment, and accumulated inaccuracies that can mislead action extraction. Planning with coarse-grained subgoals partially alleviates efficiency issues. However, their forward planning schemes can still result in off-task predictions due to accumulation errors, leading to misalignment with long-term goals. This raises a critical question: Can robotic planning be both efficient and accurate enough for real-time control in long-horizon, multi-stage tasks? To address this, we propose a Backward Planning scheme in Latent space (LBP), which begins by grounding the task into final latent goals, followed by recursively predicting intermediate subgoals closer to the current state. The grounded final goal enables backward subgoal planning to always remain aware of task completion, facilitating on-task prediction along the entire planning horizon. The subgoal-conditioned policy incorporates a learnable token to summarize the subgoal sequences and determines how each subgoal guides action extraction. Through extensive simulation and real-robot long-horizon experiments, we show that LBP outperforms existing fine-grained and forward planning methods, achieving SOTA performance. Project Page: https://lbp-authors.github.io.}
}

Endnote

%0 Conference Paper
%T Efficient Robotic Policy Learning via Latent Space Backward Planning
%A Dongxiu Liu
%A Haoyi Niu
%A Zhihao Wang
%A Jinliang Zheng
%A Yinan Zheng
%A Zhonghong Ou
%A Jianming Hu
%A Jianxiong Li
%A Xianyuan Zhan
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-liu25aw
%I PMLR
%P 39147--39161
%U https://proceedings.mlr.press/v267/liu25aw.html
%V 267
%X Current robotic planning methods often rely on predicting multi-frame images with full pixel details. While this fine-grained approach can serve as a generic world model, it introduces two significant challenges for downstream policy learning: substantial computational costs that hinder real-time deployment, and accumulated inaccuracies that can mislead action extraction. Planning with coarse-grained subgoals partially alleviates efficiency issues. However, their forward planning schemes can still result in off-task predictions due to accumulation errors, leading to misalignment with long-term goals. This raises a critical question: Can robotic planning be both efficient and accurate enough for real-time control in long-horizon, multi-stage tasks? To address this, we propose a Backward Planning scheme in Latent space (LBP), which begins by grounding the task into final latent goals, followed by recursively predicting intermediate subgoals closer to the current state. The grounded final goal enables backward subgoal planning to always remain aware of task completion, facilitating on-task prediction along the entire planning horizon. The subgoal-conditioned policy incorporates a learnable token to summarize the subgoal sequences and determines how each subgoal guides action extraction. Through extensive simulation and real-robot long-horizon experiments, we show that LBP outperforms existing fine-grained and forward planning methods, achieving SOTA performance. Project Page: https://lbp-authors.github.io.

APA

Liu, D., Niu, H., Wang, Z., Zheng, J., Zheng, Y., Ou, Z., Hu, J., Li, J. & Zhan, X.. (2025). Efficient Robotic Policy Learning via Latent Space Backward Planning. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:39147-39161 Available from https://proceedings.mlr.press/v267/liu25aw.html.

Efficient Robotic Policy Learning via Latent Space Backward Planning

Abstract

Cite this Paper

Related Material