ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving

Xueyi Liu; Zuodong Zhong; Qichao Zhang; Yuxin Guo; Yupeng Zheng; Junli Wang; Dongbin Zhao; Yun-Fu Liu; Zhiguo Su; Yinfeng Gao; Qiao Lin; Chen Huiyong

ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving

Xueyi Liu, Zuodong Zhong, Qichao Zhang, Yuxin Guo, Yupeng Zheng, Junli Wang, Dongbin Zhao, Yun-Fu Liu, Zhiguo Su, Yinfeng Gao, Qiao Lin, Chen Huiyong

Proceedings of The 9th Conference on Robot Learning, PMLR 305:3051-3068, 2025.

Abstract

Due to the powerful vision-language reasoning and generalization abilities, multimodal large language models (MLLMs) have garnered significant attention in the field of end-to-end (E2E) autonomous driving. However, their application to closed-loop systems remains underexplored, and current MLLM-based methods have not shown clear superiority to mainstream E2E imitation learning approaches. In this work, we propose ReasonPlan, a novel MLLM fine-tuning framework designed for closed-loop driving through holistic reasoning with a self-supervised Next Scene Prediction task and supervised Decision Chain-of-Thought process. This dual mechanism encourages the model to align visual representations with actionable driving context, while promoting interpretable and causally grounded decision making. We curate a planning-oriented decision reasoning dataset, namely PDR, comprising 210k diverse and high-quality samples. Our method outperforms the mainstream E2E imitation learning method by a large margin of 19% L2 and 16.1 driving score on Bench2Drive benchmark. Furthermore, ReasonPlan demonstrates strong zero-shot generalization on unseen DOS benchmark, highlighting its adaptability in handling zero-shot corner cases.

Cite this Paper

BibTeX

@InProceedings{pmlr-v305-liu25e,
  title = 	 {ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving},
  author =       {Liu, Xueyi and Zhong, Zuodong and Zhang, Qichao and Guo, Yuxin and Zheng, Yupeng and Wang, Junli and Zhao, Dongbin and Liu, Yun-Fu and Su, Zhiguo and Gao, Yinfeng and Lin, Qiao and Huiyong, Chen},
  booktitle = 	 {Proceedings of The 9th Conference on Robot Learning},
  pages = 	 {3051--3068},
  year = 	 {2025},
  editor = 	 {Lim, Joseph and Song, Shuran and Park, Hae-Won},
  volume = 	 {305},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {27--30 Sep},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v305/main/assets/liu25e/liu25e.pdf},
  url = 	 {https://proceedings.mlr.press/v305/liu25e.html},
  abstract = 	 {Due to the powerful vision-language reasoning and generalization abilities, multimodal large language models (MLLMs) have garnered significant attention in the field of end-to-end (E2E) autonomous driving. However, their application to closed-loop systems remains underexplored, and current MLLM-based methods have not shown clear superiority to mainstream E2E imitation learning approaches. In this work, we propose ReasonPlan, a novel MLLM fine-tuning framework designed for closed-loop driving through holistic reasoning with a self-supervised Next Scene Prediction task and supervised Decision Chain-of-Thought process. This dual mechanism encourages the model to align visual representations with actionable driving context, while promoting interpretable and causally grounded decision making. We curate a planning-oriented decision reasoning dataset, namely PDR, comprising 210k diverse and high-quality samples. Our method outperforms the mainstream E2E imitation learning method by a large margin of 19% L2 and 16.1 driving score on Bench2Drive benchmark. Furthermore, ReasonPlan demonstrates strong zero-shot generalization on unseen DOS benchmark, highlighting its adaptability in handling zero-shot corner cases.}
}

Endnote

%0 Conference Paper
%T ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving
%A Xueyi Liu
%A Zuodong Zhong
%A Qichao Zhang
%A Yuxin Guo
%A Yupeng Zheng
%A Junli Wang
%A Dongbin Zhao
%A Yun-Fu Liu
%A Zhiguo Su
%A Yinfeng Gao
%A Qiao Lin
%A Chen Huiyong
%B Proceedings of The 9th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Joseph Lim
%E Shuran Song
%E Hae-Won Park	
%F pmlr-v305-liu25e
%I PMLR
%P 3051--3068
%U https://proceedings.mlr.press/v305/liu25e.html
%V 305
%X Due to the powerful vision-language reasoning and generalization abilities, multimodal large language models (MLLMs) have garnered significant attention in the field of end-to-end (E2E) autonomous driving. However, their application to closed-loop systems remains underexplored, and current MLLM-based methods have not shown clear superiority to mainstream E2E imitation learning approaches. In this work, we propose ReasonPlan, a novel MLLM fine-tuning framework designed for closed-loop driving through holistic reasoning with a self-supervised Next Scene Prediction task and supervised Decision Chain-of-Thought process. This dual mechanism encourages the model to align visual representations with actionable driving context, while promoting interpretable and causally grounded decision making. We curate a planning-oriented decision reasoning dataset, namely PDR, comprising 210k diverse and high-quality samples. Our method outperforms the mainstream E2E imitation learning method by a large margin of 19% L2 and 16.1 driving score on Bench2Drive benchmark. Furthermore, ReasonPlan demonstrates strong zero-shot generalization on unseen DOS benchmark, highlighting its adaptability in handling zero-shot corner cases.

APA

Liu, X., Zhong, Z., Zhang, Q., Guo, Y., Zheng, Y., Wang, J., Zhao, D., Liu, Y., Su, Z., Gao, Y., Lin, Q. & Huiyong, C.. (2025). ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:3051-3068 Available from https://proceedings.mlr.press/v305/liu25e.html.

ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving

Abstract

Cite this Paper

Related Material