小型推理语言模型技术研究

摘要

语言模型的持续演进催生了大规模架构的发展，这些架构在广泛任务中展现出卓越性能。然而，这些模型伴随着显著的计算和能源需求，以及潜在的隐私问题。在此背景下，参数规模约为5亿的小型推理语言模型（SRLMs）因其显著的计算效率和成本效益，尤其是在资源受限的环境中，成为一个引人注目的替代方案。尽管具备这些优势，5亿参数模型的有限能力在处理复杂任务如数学推理和代码生成时仍面临挑战。本研究探讨了多种训练策略，包括监督微调（SFT）、知识蒸馏（KD）和强化学习（RL），以及它们的混合实现，旨在提升5亿参数SRLMs的性能。我们分析了有效的方法论以缩小SRLMs与更大模型之间的性能差距，并提出了针对这些小型架构优化的训练流程见解。通过广泛的实验验证与分析，我们的工作旨在为最大化5亿参数模型的推理能力提供可操作的建议。

English

The ongoing evolution of language models has led to the development of large-scale architectures that demonstrate exceptional performance across a wide range of tasks. However, these models come with significant computational and energy demands, as well as potential privacy implications. In this context, Small Reasoning Language Models (SRLMs) with approximately 0.5 billion parameters present a compelling alternative due to their remarkable computational efficiency and cost effectiveness, particularly in resource-constrained environments. Despite these advantages, the limited capacity of 0.5 billion parameter models poses challenges in handling complex tasks such as mathematical reasoning and code generation. This research investigates various training strategies, including supervised fine-tuning (SFT), knowledge distillation (KD), and reinforcement learning (RL), as well as their hybrid implementations, to enhance the performance of 0.5B SRLMs. We analyze effective methodologies to bridge the performance gap between SRLMS and larger models and present insights into optimal training pipelines tailored for these smaller architectures. Through extensive experimental validation and analysis, our work aims to provide actionable recommendations for maximizing the reasoning capabilities of 0.5B models.

小型推理语言模型技术研究

A Technical Study into Small Reasoning Language Models

摘要

Support