Effective Learning for Small Reasoning Models: An Empirical Study on 0.5B Reasoning LLMs

Xialie Zhuang, Peixian MA, Zhikai Jia, Zane Cao, Shiwei Liu
Conference on Parsimony and Learning, PMLR 328:855-869, 2026.

Abstract

The ongoing evolution of language models has led to the development of large-scale architectures that demonstrate exceptional performance across a wide range of tasks. However, these models come with significant computational and energy demands, as well as potential privacy implications. In this context, Small Reasoning Language Models (SRLMs) with approximately 0.5 billion parameters present a compelling alternative due to their remarkable computational efficiency and cost-effectiveness, particularly in resource-constrained environments. Despite these advantages, the limited capacity of 0.5 billion parameter models poses challenges in handling complex tasks such as mathematical reasoning. This research investigates various training strategies, including supervised fine-tuning (SFT), knowledge distillation (KD), and reinforcement learning (RL), as well as their hybrid implementations, to enhance the performance of 0.5B SRLMs. We analyze effective methodologies to bridge the performance gap between SRLMS and larger models and present insights into optimal training pipelines tailored for these smaller architectures. Through extensive experimental validation and analysis, our work aims to provide actionable recommendations for maximizing the reasoning capabilities of 0.5B models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v328-zhuang26a, title = {Effective Learning for Small Reasoning Models: An Empirical Study on 0.5B Reasoning LLMs}, author = {Zhuang, Xialie and MA, Peixian and Jia, Zhikai and Cao, Zane and Liu, Shiwei}, booktitle = {Conference on Parsimony and Learning}, pages = {855--869}, year = {2026}, editor = {Burkholz, Rebekka and Liu, Shiwei and Ravishankar, Saiprasad and Redman, William and Huang, Wei and Su, Weijie and Zhu, Zhihui}, volume = {328}, series = {Proceedings of Machine Learning Research}, month = {23--26 Mar}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v328/main/assets/zhuang26a/zhuang26a.pdf}, url = {https://proceedings.mlr.press/v328/zhuang26a.html}, abstract = {The ongoing evolution of language models has led to the development of large-scale architectures that demonstrate exceptional performance across a wide range of tasks. However, these models come with significant computational and energy demands, as well as potential privacy implications. In this context, Small Reasoning Language Models (SRLMs) with approximately 0.5 billion parameters present a compelling alternative due to their remarkable computational efficiency and cost-effectiveness, particularly in resource-constrained environments. Despite these advantages, the limited capacity of 0.5 billion parameter models poses challenges in handling complex tasks such as mathematical reasoning. This research investigates various training strategies, including supervised fine-tuning (SFT), knowledge distillation (KD), and reinforcement learning (RL), as well as their hybrid implementations, to enhance the performance of 0.5B SRLMs. We analyze effective methodologies to bridge the performance gap between SRLMS and larger models and present insights into optimal training pipelines tailored for these smaller architectures. Through extensive experimental validation and analysis, our work aims to provide actionable recommendations for maximizing the reasoning capabilities of 0.5B models.} }
Endnote
%0 Conference Paper %T Effective Learning for Small Reasoning Models: An Empirical Study on 0.5B Reasoning LLMs %A Xialie Zhuang %A Peixian MA %A Zhikai Jia %A Zane Cao %A Shiwei Liu %B Conference on Parsimony and Learning %C Proceedings of Machine Learning Research %D 2026 %E Rebekka Burkholz %E Shiwei Liu %E Saiprasad Ravishankar %E William Redman %E Wei Huang %E Weijie Su %E Zhihui Zhu %F pmlr-v328-zhuang26a %I PMLR %P 855--869 %U https://proceedings.mlr.press/v328/zhuang26a.html %V 328 %X The ongoing evolution of language models has led to the development of large-scale architectures that demonstrate exceptional performance across a wide range of tasks. However, these models come with significant computational and energy demands, as well as potential privacy implications. In this context, Small Reasoning Language Models (SRLMs) with approximately 0.5 billion parameters present a compelling alternative due to their remarkable computational efficiency and cost-effectiveness, particularly in resource-constrained environments. Despite these advantages, the limited capacity of 0.5 billion parameter models poses challenges in handling complex tasks such as mathematical reasoning. This research investigates various training strategies, including supervised fine-tuning (SFT), knowledge distillation (KD), and reinforcement learning (RL), as well as their hybrid implementations, to enhance the performance of 0.5B SRLMs. We analyze effective methodologies to bridge the performance gap between SRLMS and larger models and present insights into optimal training pipelines tailored for these smaller architectures. Through extensive experimental validation and analysis, our work aims to provide actionable recommendations for maximizing the reasoning capabilities of 0.5B models.
APA
Zhuang, X., MA, P., Jia, Z., Cao, Z. & Liu, S.. (2026). Effective Learning for Small Reasoning Models: An Empirical Study on 0.5B Reasoning LLMs. Conference on Parsimony and Learning, in Proceedings of Machine Learning Research 328:855-869 Available from https://proceedings.mlr.press/v328/zhuang26a.html.

Related Material