Rethinking Chain-of-Thought from the Perspective of Self-Training

Zongqian Wu; Baoduo Xu; Ruochen Cui; Mengmeng Zhan; Xiaofeng Zhu; Lei Feng

Rethinking Chain-of-Thought from the Perspective of Self-Training

Zongqian Wu, Baoduo Xu, Ruochen Cui, Mengmeng Zhan, Xiaofeng Zhu, Lei Feng

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:67917-67937, 2025.

Abstract

Chain-of-thought (CoT) reasoning has emerged as an effective approach for activating latent capabilities in LLMs. Interestingly, we observe that both CoT reasoning and self-training share the core objective: iteratively leveraging model-generated information to progressively reduce prediction uncertainty. Building on this insight, we propose a novel CoT framework to improve reasoning performance. Our framework integrates two key components: (i) a task-specific prompt module that optimizes the initial reasoning process, and (ii) an adaptive reasoning iteration module that dynamically refines the reasoning process and addresses the limitations of previous CoT approaches, i.e., over-reasoning and high similarity between consecutive reasoning iterations. Extensive experiments show that the proposed method achieves significant advantages in both performance and computational efficiency. Our code is available at: https://github.com/zongqianwu/ST-COT.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-wu25al,
  title = 	 {Rethinking Chain-of-Thought from the Perspective of Self-Training},
  author =       {Wu, Zongqian and Xu, Baoduo and Cui, Ruochen and Zhan, Mengmeng and Zhu, Xiaofeng and Feng, Lei},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {67917--67937},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/wu25al/wu25al.pdf},
  url = 	 {https://proceedings.mlr.press/v267/wu25al.html},
  abstract = 	 {Chain-of-thought (CoT) reasoning has emerged as an effective approach for activating latent capabilities in LLMs. Interestingly, we observe that both CoT reasoning and self-training share the core objective: iteratively leveraging model-generated information to progressively reduce prediction uncertainty. Building on this insight, we propose a novel CoT framework to improve reasoning performance. Our framework integrates two key components: (i) a task-specific prompt module that optimizes the initial reasoning process, and (ii) an adaptive reasoning iteration module that dynamically refines the reasoning process and addresses the limitations of previous CoT approaches, i.e., over-reasoning and high similarity between consecutive reasoning iterations. Extensive experiments show that the proposed method achieves significant advantages in both performance and computational efficiency. Our code is available at: https://github.com/zongqianwu/ST-COT.}
}

Endnote

%0 Conference Paper
%T Rethinking Chain-of-Thought from the Perspective of Self-Training
%A Zongqian Wu
%A Baoduo Xu
%A Ruochen Cui
%A Mengmeng Zhan
%A Xiaofeng Zhu
%A Lei Feng
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-wu25al
%I PMLR
%P 67917--67937
%U https://proceedings.mlr.press/v267/wu25al.html
%V 267
%X Chain-of-thought (CoT) reasoning has emerged as an effective approach for activating latent capabilities in LLMs. Interestingly, we observe that both CoT reasoning and self-training share the core objective: iteratively leveraging model-generated information to progressively reduce prediction uncertainty. Building on this insight, we propose a novel CoT framework to improve reasoning performance. Our framework integrates two key components: (i) a task-specific prompt module that optimizes the initial reasoning process, and (ii) an adaptive reasoning iteration module that dynamically refines the reasoning process and addresses the limitations of previous CoT approaches, i.e., over-reasoning and high similarity between consecutive reasoning iterations. Extensive experiments show that the proposed method achieves significant advantages in both performance and computational efficiency. Our code is available at: https://github.com/zongqianwu/ST-COT.

APA

Wu, Z., Xu, B., Cui, R., Zhan, M., Zhu, X. & Feng, L.. (2025). Rethinking Chain-of-Thought from the Perspective of Self-Training. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:67917-67937 Available from https://proceedings.mlr.press/v267/wu25al.html.

Rethinking Chain-of-Thought from the Perspective of Self-Training

Abstract

Cite this Paper

Related Material