小型推理語言模型的技術研究
A Technical Study into Small Reasoning Language Models
June 16, 2025
作者: Xialie Zhuang, Peixian Ma, Zhikai Jia, Zheng Cao, Shiwei Liu
cs.AI
摘要
語言模型的不斷演進促成了大規模架構的發展,這些架構在廣泛任務中展現出卓越的性能。然而,這些模型伴隨著顯著的計算和能源需求,以及潛在的隱私問題。在此背景下,參數量約為5億的小型推理語言模型(SRLMs)因其顯著的計算效率和成本效益,特別是在資源受限的環境中,成為了一個引人注目的替代方案。儘管具有這些優勢,5億參數模型的有限能力在處理複雜任務(如數學推理和代碼生成)時仍面臨挑戰。本研究探討了多種訓練策略,包括監督式微調(SFT)、知識蒸餾(KD)和強化學習(RL),以及它們的混合實現,以提升5億參數SRLMs的性能。我們分析了有效的方法來縮小SRLMs與更大模型之間的性能差距,並提出了針對這些小型架構的最佳訓練流程的見解。通過廣泛的實驗驗證與分析,我們的工作旨在為最大化5億參數模型的推理能力提供可行的建議。
English
The ongoing evolution of language models has led to the development of
large-scale architectures that demonstrate exceptional performance across a
wide range of tasks. However, these models come with significant computational
and energy demands, as well as potential privacy implications. In this context,
Small Reasoning Language Models (SRLMs) with approximately 0.5 billion
parameters present a compelling alternative due to their remarkable
computational efficiency and cost effectiveness, particularly in
resource-constrained environments. Despite these advantages, the limited
capacity of 0.5 billion parameter models poses challenges in handling complex
tasks such as mathematical reasoning and code generation. This research
investigates various training strategies, including supervised fine-tuning
(SFT), knowledge distillation (KD), and reinforcement learning (RL), as well as
their hybrid implementations, to enhance the performance of 0.5B SRLMs. We
analyze effective methodologies to bridge the performance gap between SRLMS and
larger models and present insights into optimal training pipelines tailored for
these smaller architectures. Through extensive experimental validation and
analysis, our work aims to provide actionable recommendations for maximizing
the reasoning capabilities of 0.5B models.