MARGE: Improving Math Reasoning with Guided Exploration

Jingyue Gao, Runji Lin, Keming Lu, Bowen Yu, Junyang Lin, Jianyu Chen
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:18430-18452, 2025.

Abstract

Large Language Models (LLMs) exhibit strong potential in mathematical reasoning, yet their effectiveness is often limited by a shortage of high-quality queries. This limitation necessitates scaling up computational responses through self-generated data, yet current methods struggle due to spurious correlated data caused by ineffective exploration across all reasoning stages. To address such challenge, we introduce MARGE: Improving Math Reasoning with Guided Exploration, a novel method that enhances mathematical reasoning through hit-guided exploration. MARGE systematically explores intermediate reasoning states derived from self-generated solutions, enabling adequate exploration and improved credit assignment throughout the reasoning process. Notably, MARGE improves both single-shot accuracy and exploration diversity, mitigating a common trade-off in alignment methods. These results demonstrate MARGE’s effectiveness in enhancing mathematical reasoning capabilities and unlocking the potential of scaling self-generated training data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-gao25h, title = {{MARGE}: Improving Math Reasoning with Guided Exploration}, author = {Gao, Jingyue and Lin, Runji and Lu, Keming and Yu, Bowen and Lin, Junyang and Chen, Jianyu}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {18430--18452}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/gao25h/gao25h.pdf}, url = {https://proceedings.mlr.press/v267/gao25h.html}, abstract = {Large Language Models (LLMs) exhibit strong potential in mathematical reasoning, yet their effectiveness is often limited by a shortage of high-quality queries. This limitation necessitates scaling up computational responses through self-generated data, yet current methods struggle due to spurious correlated data caused by ineffective exploration across all reasoning stages. To address such challenge, we introduce MARGE: Improving Math Reasoning with Guided Exploration, a novel method that enhances mathematical reasoning through hit-guided exploration. MARGE systematically explores intermediate reasoning states derived from self-generated solutions, enabling adequate exploration and improved credit assignment throughout the reasoning process. Notably, MARGE improves both single-shot accuracy and exploration diversity, mitigating a common trade-off in alignment methods. These results demonstrate MARGE’s effectiveness in enhancing mathematical reasoning capabilities and unlocking the potential of scaling self-generated training data.} }
Endnote
%0 Conference Paper %T MARGE: Improving Math Reasoning with Guided Exploration %A Jingyue Gao %A Runji Lin %A Keming Lu %A Bowen Yu %A Junyang Lin %A Jianyu Chen %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-gao25h %I PMLR %P 18430--18452 %U https://proceedings.mlr.press/v267/gao25h.html %V 267 %X Large Language Models (LLMs) exhibit strong potential in mathematical reasoning, yet their effectiveness is often limited by a shortage of high-quality queries. This limitation necessitates scaling up computational responses through self-generated data, yet current methods struggle due to spurious correlated data caused by ineffective exploration across all reasoning stages. To address such challenge, we introduce MARGE: Improving Math Reasoning with Guided Exploration, a novel method that enhances mathematical reasoning through hit-guided exploration. MARGE systematically explores intermediate reasoning states derived from self-generated solutions, enabling adequate exploration and improved credit assignment throughout the reasoning process. Notably, MARGE improves both single-shot accuracy and exploration diversity, mitigating a common trade-off in alignment methods. These results demonstrate MARGE’s effectiveness in enhancing mathematical reasoning capabilities and unlocking the potential of scaling self-generated training data.
APA
Gao, J., Lin, R., Lu, K., Yu, B., Lin, J. & Chen, J.. (2025). MARGE: Improving Math Reasoning with Guided Exploration. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:18430-18452 Available from https://proceedings.mlr.press/v267/gao25h.html.

Related Material