AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence

Yuliang Liu, Junjie Lu, Chaofeng Qu, Zhaoling Chen, Zefan Cai, Jason Klein Liu, Chonghan Liu, Yunhui Xia, Li Zhao, Jiang Bian, Chuheng Zhang, Wei Shen, Zhouhan Lin
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:39016-39031, 2025.

Abstract

Current approaches for training Process Reward Models (PRMs) often involve deconposing responses into multiple reasoning steps using rule-based techniques, such as using predefined placeholder tokens or setting the reasoning step’s length to a fixed size. These approaches overlook the fact that certain words don’t usually indicate true decision points. To address this, we propose AdaptiveStep, a method that divides reasoning steps based on the model’s confidence in predicting the next word, offering more information on decision-making at each step, improving downstream tasks like reward model training. Moreover, our method requires no manual annotation. Experiments with AdaptiveStep-trained PRMs in mathematical reasoning and code generation show that the outcome PRM achieves state-of-the-art Best-of-N performance, surpassing greedy search strategy with token-level value-guided decoding, while also reducing construction costs by over 30% compared to existing open-source PRMs. We also provide a thorough analysis and case study on its performance, transferability, and generalization capabilities. We provide our code on https://github.com/Lux0926/ASPRM.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-liu25aq, title = {{A}daptive{S}tep: Automatically Dividing Reasoning Step through Model Confidence}, author = {Liu, Yuliang and Lu, Junjie and Qu, Chaofeng and Chen, Zhaoling and Cai, Zefan and Liu, Jason Klein and Liu, Chonghan and Xia, Yunhui and Zhao, Li and Bian, Jiang and Zhang, Chuheng and Shen, Wei and Lin, Zhouhan}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {39016--39031}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/liu25aq/liu25aq.pdf}, url = {https://proceedings.mlr.press/v267/liu25aq.html}, abstract = {Current approaches for training Process Reward Models (PRMs) often involve deconposing responses into multiple reasoning steps using rule-based techniques, such as using predefined placeholder tokens or setting the reasoning step’s length to a fixed size. These approaches overlook the fact that certain words don’t usually indicate true decision points. To address this, we propose AdaptiveStep, a method that divides reasoning steps based on the model’s confidence in predicting the next word, offering more information on decision-making at each step, improving downstream tasks like reward model training. Moreover, our method requires no manual annotation. Experiments with AdaptiveStep-trained PRMs in mathematical reasoning and code generation show that the outcome PRM achieves state-of-the-art Best-of-N performance, surpassing greedy search strategy with token-level value-guided decoding, while also reducing construction costs by over 30% compared to existing open-source PRMs. We also provide a thorough analysis and case study on its performance, transferability, and generalization capabilities. We provide our code on https://github.com/Lux0926/ASPRM.} }
Endnote
%0 Conference Paper %T AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence %A Yuliang Liu %A Junjie Lu %A Chaofeng Qu %A Zhaoling Chen %A Zefan Cai %A Jason Klein Liu %A Chonghan Liu %A Yunhui Xia %A Li Zhao %A Jiang Bian %A Chuheng Zhang %A Wei Shen %A Zhouhan Lin %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-liu25aq %I PMLR %P 39016--39031 %U https://proceedings.mlr.press/v267/liu25aq.html %V 267 %X Current approaches for training Process Reward Models (PRMs) often involve deconposing responses into multiple reasoning steps using rule-based techniques, such as using predefined placeholder tokens or setting the reasoning step’s length to a fixed size. These approaches overlook the fact that certain words don’t usually indicate true decision points. To address this, we propose AdaptiveStep, a method that divides reasoning steps based on the model’s confidence in predicting the next word, offering more information on decision-making at each step, improving downstream tasks like reward model training. Moreover, our method requires no manual annotation. Experiments with AdaptiveStep-trained PRMs in mathematical reasoning and code generation show that the outcome PRM achieves state-of-the-art Best-of-N performance, surpassing greedy search strategy with token-level value-guided decoding, while also reducing construction costs by over 30% compared to existing open-source PRMs. We also provide a thorough analysis and case study on its performance, transferability, and generalization capabilities. We provide our code on https://github.com/Lux0926/ASPRM.
APA
Liu, Y., Lu, J., Qu, C., Chen, Z., Cai, Z., Liu, J.K., Liu, C., Xia, Y., Zhao, L., Bian, J., Zhang, C., Shen, W. & Lin, Z.. (2025). AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:39016-39031 Available from https://proceedings.mlr.press/v267/liu25aq.html.

Related Material