ChatPaper.aiChatPaper

南北阁4.1-3B:一个具备推理、对齐与行动能力的小型通用模型

Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts

February 13, 2026
作者: Chen Yang, Guangyue Peng, Jiaying Zhu, Ran Le, Ruixiang Feng, Tao Zhang, Xiyun Xu, Yang Song, Yiming Jia, Yuntao Wen, Yunzhi Xu, Zekai Wang, Zhenwei An, Zhicong Sun, Zongchao Chen
cs.AI

摘要

我们推出南贝阁4.1-3B模型,这是一个仅用30亿参数就同时实现强大智能体行为、代码生成与通用推理能力的统一通用语言模型。据我们所知,这是首个在单一模型中实现如此多功能的开源小语言模型(SLM)。为提升推理能力与偏好对齐,我们结合点对点和配对奖励建模技术,确保生成高质量且符合人类偏好的响应。在代码生成方面,我们通过强化学习设计复杂度感知奖励机制,同步优化正确性与执行效率。针对深度搜索任务,我们实施复杂数据合成策略,并在训练阶段引入轮次级监督机制,使模型能够稳定进行长周期工具交互——南贝阁4.1-3B可可靠执行多达600轮工具调用以解决复杂问题。大量实验结果表明,南贝阁4.1-3B显著超越同规模先前模型(如Nanbeige4-3B-2511和Qwen3-4B),甚至在某些任务上优于参数量大得多的Qwen3-30B-A3B模型。我们的研究成果证明,小模型能够同时实现广泛能力与专业优势,重新定义了30亿参数级模型的发展潜力。
English
We present Nanbeige4.1-3B, a unified generalist language model that simultaneously achieves strong agentic behavior, code generation, and general reasoning with only 3B parameters. To the best of our knowledge, it is the first open-source small language model (SLM) to achieve such versatility in a single model. To improve reasoning and preference alignment, we combine point-wise and pair-wise reward modeling, ensuring high-quality, human-aligned responses. For code generation, we design complexity-aware rewards in Reinforcement Learning, optimizing both correctness and efficiency. In deep search, we perform complex data synthesis and incorporate turn-level supervision during training. This enables stable long-horizon tool interactions, allowing Nanbeige4.1-3B to reliably execute up to 600 tool-call turns for complex problem-solving. Extensive experimental results show that Nanbeige4.1-3B significantly outperforms prior models of similar scale, such as Nanbeige4-3B-2511 and Qwen3-4B, even achieving superior performance compared to much larger models, such as Qwen3-30B-A3B. Our results demonstrate that small models can achieve both broad competence and strong specialization simultaneously, redefining the potential of 3B parameter models.
PDF172February 18, 2026