Enhancing Robustness in Multi-Step Reasoning: A Synergistic Approach Combining Planning with Reflective Self-Correction

Xinyuan Wang, Danli Wang, Hehao Zhang, Bo You, Xueen Li, Yu Gu
Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, PMLR 278:46-56, 2025.

Abstract

Large Reasoning Models (LRMs) have significantly advanced complex problem-solving capabilities by incorporating extended chain-of-thought (CoT) reasoning. However, managing error propagation over long inference chains remains a critical challenge. In this work, we propose a novel self-supervised framework named \textbf{P}lanning and \textbf{Re}flective \textbf{S}elf-\textbf{C}orrection that integrates two complementary mechanisms: planning phase and reflection phase. The planning phase decomposes complex queries into streaming sub-problems and generates detailed reasoning trajectories, while the reflection phase leverages corrective feedback from erroneous outputs to refine these trajectories. The datasets sampled through these two mechanisms are used for self-supervised training, further reinforcing the LLM’s reasoning capabilities. Experiments conducted on the multi-hop Question Answering dataset demonstrate that our approach enhances the model’s ability to generate coherent and accurate reasoning paths. Ablation studies further reveal the distinct contributions of planning and reflection to the overall performance. Our results suggest that integrating anticipatory planning with reflective self-correction provides a promising avenue for robust long-range inference in LRMs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v278-wang25a, title = {Enhancing Robustness in Multi-Step Reasoning: A Synergistic Approach Combining Planning with Reflective Self-Correction}, author = {Wang, Xinyuan and Wang, Danli and Zhang, Hehao and You, Bo and Li, Xueen and Gu, Yu}, booktitle = {Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing}, pages = {46--56}, year = {2025}, editor = {Zeng, Nianyin and Pachori, Ram Bilas and Wang, Dongshu}, volume = {278}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v278/main/assets/wang25a/wang25a.pdf}, url = {https://proceedings.mlr.press/v278/wang25a.html}, abstract = {Large Reasoning Models (LRMs) have significantly advanced complex problem-solving capabilities by incorporating extended chain-of-thought (CoT) reasoning. However, managing error propagation over long inference chains remains a critical challenge. In this work, we propose a novel self-supervised framework named \textbf{P}lanning and \textbf{Re}flective \textbf{S}elf-\textbf{C}orrection that integrates two complementary mechanisms: planning phase and reflection phase. The planning phase decomposes complex queries into streaming sub-problems and generates detailed reasoning trajectories, while the reflection phase leverages corrective feedback from erroneous outputs to refine these trajectories. The datasets sampled through these two mechanisms are used for self-supervised training, further reinforcing the LLM’s reasoning capabilities. Experiments conducted on the multi-hop Question Answering dataset demonstrate that our approach enhances the model’s ability to generate coherent and accurate reasoning paths. Ablation studies further reveal the distinct contributions of planning and reflection to the overall performance. Our results suggest that integrating anticipatory planning with reflective self-correction provides a promising avenue for robust long-range inference in LRMs.} }
Endnote
%0 Conference Paper %T Enhancing Robustness in Multi-Step Reasoning: A Synergistic Approach Combining Planning with Reflective Self-Correction %A Xinyuan Wang %A Danli Wang %A Hehao Zhang %A Bo You %A Xueen Li %A Yu Gu %B Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing %C Proceedings of Machine Learning Research %D 2025 %E Nianyin Zeng %E Ram Bilas Pachori %E Dongshu Wang %F pmlr-v278-wang25a %I PMLR %P 46--56 %U https://proceedings.mlr.press/v278/wang25a.html %V 278 %X Large Reasoning Models (LRMs) have significantly advanced complex problem-solving capabilities by incorporating extended chain-of-thought (CoT) reasoning. However, managing error propagation over long inference chains remains a critical challenge. In this work, we propose a novel self-supervised framework named \textbf{P}lanning and \textbf{Re}flective \textbf{S}elf-\textbf{C}orrection that integrates two complementary mechanisms: planning phase and reflection phase. The planning phase decomposes complex queries into streaming sub-problems and generates detailed reasoning trajectories, while the reflection phase leverages corrective feedback from erroneous outputs to refine these trajectories. The datasets sampled through these two mechanisms are used for self-supervised training, further reinforcing the LLM’s reasoning capabilities. Experiments conducted on the multi-hop Question Answering dataset demonstrate that our approach enhances the model’s ability to generate coherent and accurate reasoning paths. Ablation studies further reveal the distinct contributions of planning and reflection to the overall performance. Our results suggest that integrating anticipatory planning with reflective self-correction provides a promising avenue for robust long-range inference in LRMs.
APA
Wang, X., Wang, D., Zhang, H., You, B., Li, X. & Gu, Y.. (2025). Enhancing Robustness in Multi-Step Reasoning: A Synergistic Approach Combining Planning with Reflective Self-Correction. Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, in Proceedings of Machine Learning Research 278:46-56 Available from https://proceedings.mlr.press/v278/wang25a.html.

Related Material