Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization

Zishun Yu, Tengyu Xu, Di Jin, Karthik Abinav Sankararaman, Yun He, Wenxuan Zhou, Zhouhao Zeng, Eryk Helenowski, Chen Zhu, Sinong Wang, Hao Ma, Han Fang
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:73304-73325, 2025.

Abstract

Solving mathematics problems has been an intriguing capability of large language models, and many efforts have been made to improve reasoning by extending reasoning length, such as through self-correction and extensive long chain-of-thoughts. While promising in problem-solving, advanced long reasoning chain models exhibit an undesired single-modal behavior, where trivial questions require unnecessarily tedious long chains of thought. In this work, we propose a way to allow models to be aware of inference budgets by formulating it as utility maximization with respect to an inference budget constraint, hence naming our algorithm Inference Budget-Constrained Policy Optimization (IBPO). In a nutshell, models fine-tuned through IBPO learn to “understand” the difficulty of queries and allocate inference budgets to harder ones. With different inference budgets, our best models are able to have a $4.14$% and $5.74$% absolute improvement ($8.08$% and $11.2$% relative improvement) on MATH500 using $2.16$x and $4.32$x inference budgets respectively, relative to LLaMA3.1 8B Instruct. These improvements are approximately $2$x those of self-consistency under the same budgets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-yu25s, title = {Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization}, author = {Yu, Zishun and Xu, Tengyu and Jin, Di and Sankararaman, Karthik Abinav and He, Yun and Zhou, Wenxuan and Zeng, Zhouhao and Helenowski, Eryk and Zhu, Chen and Wang, Sinong and Ma, Hao and Fang, Han}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {73304--73325}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/yu25s/yu25s.pdf}, url = {https://proceedings.mlr.press/v267/yu25s.html}, abstract = {Solving mathematics problems has been an intriguing capability of large language models, and many efforts have been made to improve reasoning by extending reasoning length, such as through self-correction and extensive long chain-of-thoughts. While promising in problem-solving, advanced long reasoning chain models exhibit an undesired single-modal behavior, where trivial questions require unnecessarily tedious long chains of thought. In this work, we propose a way to allow models to be aware of inference budgets by formulating it as utility maximization with respect to an inference budget constraint, hence naming our algorithm Inference Budget-Constrained Policy Optimization (IBPO). In a nutshell, models fine-tuned through IBPO learn to “understand” the difficulty of queries and allocate inference budgets to harder ones. With different inference budgets, our best models are able to have a $4.14$% and $5.74$% absolute improvement ($8.08$% and $11.2$% relative improvement) on MATH500 using $2.16$x and $4.32$x inference budgets respectively, relative to LLaMA3.1 8B Instruct. These improvements are approximately $2$x those of self-consistency under the same budgets.} }
Endnote
%0 Conference Paper %T Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization %A Zishun Yu %A Tengyu Xu %A Di Jin %A Karthik Abinav Sankararaman %A Yun He %A Wenxuan Zhou %A Zhouhao Zeng %A Eryk Helenowski %A Chen Zhu %A Sinong Wang %A Hao Ma %A Han Fang %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-yu25s %I PMLR %P 73304--73325 %U https://proceedings.mlr.press/v267/yu25s.html %V 267 %X Solving mathematics problems has been an intriguing capability of large language models, and many efforts have been made to improve reasoning by extending reasoning length, such as through self-correction and extensive long chain-of-thoughts. While promising in problem-solving, advanced long reasoning chain models exhibit an undesired single-modal behavior, where trivial questions require unnecessarily tedious long chains of thought. In this work, we propose a way to allow models to be aware of inference budgets by formulating it as utility maximization with respect to an inference budget constraint, hence naming our algorithm Inference Budget-Constrained Policy Optimization (IBPO). In a nutshell, models fine-tuned through IBPO learn to “understand” the difficulty of queries and allocate inference budgets to harder ones. With different inference budgets, our best models are able to have a $4.14$% and $5.74$% absolute improvement ($8.08$% and $11.2$% relative improvement) on MATH500 using $2.16$x and $4.32$x inference budgets respectively, relative to LLaMA3.1 8B Instruct. These improvements are approximately $2$x those of self-consistency under the same budgets.
APA
Yu, Z., Xu, T., Jin, D., Sankararaman, K.A., He, Y., Zhou, W., Zeng, Z., Helenowski, E., Zhu, C., Wang, S., Ma, H. & Fang, H.. (2025). Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:73304-73325 Available from https://proceedings.mlr.press/v267/yu25s.html.

Related Material