孔子3-数学：一款轻量级高性能推理大语言模型，专为中国K-12数学学习设计

摘要

我們推出Confucius3-Math，這是一個擁有140億參數的開源大型語言模型，其特點在於：(1)能夠在單個消費級GPU上高效運行；(2)在眾多數學推理任務上達到業界領先水平，超越許多規模顯著更大的模型。特別地，作為我們利用AI提升教育和知識傳播使命的一部分，Confucius3-Math專注於服務中國K-12學生的數學學習及教育工作者。通過大規模強化學習（RL）進行後訓練構建，Confucius3-Math與國家課程標準對接，擅長以低成本解決主流的中國K-12數學問題。在本報告中，我們分享了開發過程中的方法論、遇到的挑戰以及為克服這些挑戰所開發的技術。特別地，我們引入了三項技術創新：定向熵正則化、近期樣本恢復和策略特定難度加權。這些創新包括一種新的熵正則化方法、新穎的數據調度策略，以及改進的群體相對優勢估計器。它們共同作用，顯著穩定了RL訓練，提高了數據效率，並提升了性能。我們的工作展示了在特定領域以低成本構建強大推理模型的可行性。我們在https://github.com/netease-youdao/Confucius3-Math開源了我們的模型和代碼。

English

We introduce Confucius3-Math, an open-source large language model with 14B parameters that (1) runs efficiently on a single consumer-grade GPU; (2) achieves SOTA performances on a range of mathematical reasoning tasks, outperforming many models with significantly larger sizes. In particular, as part of our mission to enhancing education and knowledge dissemination with AI, Confucius3-Math is specifically committed to mathematics learning for Chinese K-12 students and educators. Built via post-training with large-scale reinforcement learning (RL), Confucius3-Math aligns with national curriculum and excels at solving main-stream Chinese K-12 mathematical problems with low cost. In this report we share our development recipe, the challenges we encounter and the techniques we develop to overcome them. In particular, we introduce three technical innovations: Targeted Entropy Regularization, Recent Sample Recovery and Policy-Specific Hardness Weighting. These innovations encompass a new entropy regularization, a novel data scheduling policy, and an improved group-relative advantage estimator. Collectively, they significantly stabilize the RL training, improve data efficiency, and boost performance. Our work demonstrates the feasibility of building strong reasoning models in a particular domain at low cost. We open-source our model and code at https://github.com/netease-youdao/Confucius3-Math.

孔子3-数学：一款轻量级高性能推理大语言模型，专为中国K-12数学学习设计

Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning

摘要

Support