[edit]
Enhancing Robustness in Multi-Step Reasoning: A Synergistic Approach Combining Planning with Reflective Self-Correction
Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, PMLR 278:46-56, 2025.
Abstract
Large Reasoning Models (LRMs) have significantly advanced complex problem-solving capabilities by incorporating extended chain-of-thought (CoT) reasoning. However, managing error propagation over long inference chains remains a critical challenge. In this work, we propose a novel self-supervised framework named \textbf{P}lanning and \textbf{Re}flective \textbf{S}elf-\textbf{C}orrection that integrates two complementary mechanisms: planning phase and reflection phase. The planning phase decomposes complex queries into streaming sub-problems and generates detailed reasoning trajectories, while the reflection phase leverages corrective feedback from erroneous outputs to refine these trajectories. The datasets sampled through these two mechanisms are used for self-supervised training, further reinforcing the LLM’s reasoning capabilities. Experiments conducted on the multi-hop Question Answering dataset demonstrate that our approach enhances the model’s ability to generate coherent and accurate reasoning paths. Ablation studies further reveal the distinct contributions of planning and reflection to the overall performance. Our results suggest that integrating anticipatory planning with reflective self-correction provides a promising avenue for robust long-range inference in LRMs.