Revisiting Chain-of-Thought in Code Generation: Do Language Models Need to Learn Reasoning before Coding?

Ren-Biao Liu, Anqi Li, Chaoding Yang, Hui Sun, Ming Li
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:38809-38826, 2025.

Abstract

Large Language Models (LLMs) have demonstrated exceptional performance in code generation, becoming increasingly vital for software engineering and development. Recently, Chain-of-Thought (CoT) has proven effective for complex tasks by prompting LLMs to reason step-by-step and provide a final answer. However, research on how LLMs learn to reason with CoT data for code generation remains limited. In this work, we revisit classic CoT training, which typically learns reasoning steps before the final answer. We synthesize a dataset to separate the CoT process from code solutions and then conduct extensive experiments to study how CoT works in code generation empirically. We observe counterintuitive phenomena, suggesting that the traditional training paradigm may not yield benefits for code generation. Instead, training LLMs to generate code first and then output the CoT to explain reasoning steps for code generation is more effective. Specifically, our results indicate that a 9.86% relative performance improvement can be achieved simply by changing the order between CoT and code. Our findings provide valuable insights into leveraging CoT to enhance the reasoning capabilities of CodeLLMs and improve code generation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-liu25ah, title = {Revisiting Chain-of-Thought in Code Generation: Do Language Models Need to Learn Reasoning before Coding?}, author = {Liu, Ren-Biao and Li, Anqi and Yang, Chaoding and Sun, Hui and Li, Ming}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {38809--38826}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/liu25ah/liu25ah.pdf}, url = {https://proceedings.mlr.press/v267/liu25ah.html}, abstract = {Large Language Models (LLMs) have demonstrated exceptional performance in code generation, becoming increasingly vital for software engineering and development. Recently, Chain-of-Thought (CoT) has proven effective for complex tasks by prompting LLMs to reason step-by-step and provide a final answer. However, research on how LLMs learn to reason with CoT data for code generation remains limited. In this work, we revisit classic CoT training, which typically learns reasoning steps before the final answer. We synthesize a dataset to separate the CoT process from code solutions and then conduct extensive experiments to study how CoT works in code generation empirically. We observe counterintuitive phenomena, suggesting that the traditional training paradigm may not yield benefits for code generation. Instead, training LLMs to generate code first and then output the CoT to explain reasoning steps for code generation is more effective. Specifically, our results indicate that a 9.86% relative performance improvement can be achieved simply by changing the order between CoT and code. Our findings provide valuable insights into leveraging CoT to enhance the reasoning capabilities of CodeLLMs and improve code generation.} }
Endnote
%0 Conference Paper %T Revisiting Chain-of-Thought in Code Generation: Do Language Models Need to Learn Reasoning before Coding? %A Ren-Biao Liu %A Anqi Li %A Chaoding Yang %A Hui Sun %A Ming Li %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-liu25ah %I PMLR %P 38809--38826 %U https://proceedings.mlr.press/v267/liu25ah.html %V 267 %X Large Language Models (LLMs) have demonstrated exceptional performance in code generation, becoming increasingly vital for software engineering and development. Recently, Chain-of-Thought (CoT) has proven effective for complex tasks by prompting LLMs to reason step-by-step and provide a final answer. However, research on how LLMs learn to reason with CoT data for code generation remains limited. In this work, we revisit classic CoT training, which typically learns reasoning steps before the final answer. We synthesize a dataset to separate the CoT process from code solutions and then conduct extensive experiments to study how CoT works in code generation empirically. We observe counterintuitive phenomena, suggesting that the traditional training paradigm may not yield benefits for code generation. Instead, training LLMs to generate code first and then output the CoT to explain reasoning steps for code generation is more effective. Specifically, our results indicate that a 9.86% relative performance improvement can be achieved simply by changing the order between CoT and code. Our findings provide valuable insights into leveraging CoT to enhance the reasoning capabilities of CodeLLMs and improve code generation.
APA
Liu, R., Li, A., Yang, C., Sun, H. & Li, M.. (2025). Revisiting Chain-of-Thought in Code Generation: Do Language Models Need to Learn Reasoning before Coding?. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:38809-38826 Available from https://proceedings.mlr.press/v267/liu25ah.html.

Related Material