ChatPaper.aiChatPaper

南阁4.1-3B:具备推理、对齐与行动能力的小型通用模型

Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts

February 13, 2026
作者: Chen Yang, Guangyue Peng, Jiaying Zhu, Ran Le, Ruixiang Feng, Tao Zhang, Xiyun Xu, Yang Song, Yiming Jia, Yuntao Wen, Yunzhi Xu, Zekai Wang, Zhenwei An, Zhicong Sun, Zongchao Chen
cs.AI

摘要

我們推出南貝格4.1-3B,這是一個統一的通用語言模型,僅以30億參數就同時實現了強大的智能體行為、代碼生成與通用推理能力。據我們所知,這是首個在單一模型中實現如此多樣化能力的開源小型語言模型(SLM)。為提升推理能力與偏好對齊,我們結合點對點和配對獎勵建模技術,確保生成高質量且符合人類偏好的回應。在代碼生成方面,我們於強化學習中設計複雜度感知獎勵機制,同步優化正確性與效率。針對深度搜索任務,我們進行複雜數據合成並在訓練中引入回合級監督,使南貝格4.1-3B能穩定執行長達600次工具調用回合的複雜問題求解。大量實驗結果表明,南貝格4.1-3B顯著超越同規模模型(如Nanbeige4-3B-2511和Qwen3-4B),甚至優於參數量大十倍的Qwen3-30B-A3B模型。本研究證明小型模型可同時實現廣泛能力與深度專業化,重新定義了30億參數模型的潛力邊界。
English
We present Nanbeige4.1-3B, a unified generalist language model that simultaneously achieves strong agentic behavior, code generation, and general reasoning with only 3B parameters. To the best of our knowledge, it is the first open-source small language model (SLM) to achieve such versatility in a single model. To improve reasoning and preference alignment, we combine point-wise and pair-wise reward modeling, ensuring high-quality, human-aligned responses. For code generation, we design complexity-aware rewards in Reinforcement Learning, optimizing both correctness and efficiency. In deep search, we perform complex data synthesis and incorporate turn-level supervision during training. This enables stable long-horizon tool interactions, allowing Nanbeige4.1-3B to reliably execute up to 600 tool-call turns for complex problem-solving. Extensive experimental results show that Nanbeige4.1-3B significantly outperforms prior models of similar scale, such as Nanbeige4-3B-2511 and Qwen3-4B, even achieving superior performance compared to much larger models, such as Qwen3-30B-A3B. Our results demonstrate that small models can achieve both broad competence and strong specialization simultaneously, redefining the potential of 3B parameter models.
PDF172February 18, 2026