孔子数学3号:面向中国K-12数学学习的轻量级高性能推理大语言模型
Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning
June 23, 2025
作者: Lixin Wu, Na Cai, Qiao Cheng, Jiachen Wang, Yitao Duan
cs.AI
摘要
我们推出Confucius3-Math,这是一款拥有140亿参数的开源大型语言模型,其特点在于:(1) 能在单一消费级GPU上高效运行;(2) 在一系列数学推理任务中达到当前最优(SOTA)性能,超越了许多规模显著更大的模型。特别地,作为我们利用AI提升教育与知识传播使命的一部分,Confucius3-Math专为中国K-12学生及教育工作者设计,致力于数学学习。通过大规模强化学习(RL)的后训练构建,该模型与国家课程大纲对齐,擅长以低成本解决主流中国K-12数学问题。本报告中,我们分享了开发过程、遇到的挑战及克服这些挑战所采用的技术。特别地,我们引入了三项技术创新:目标熵正则化、近期样本恢复及策略特定难度加权。这些创新包括一种新的熵正则化方法、新颖的数据调度策略以及改进的组相对优势估计器。它们共同显著稳定了RL训练,提高了数据效率,并提升了性能。我们的工作展示了在特定领域以低成本构建强大推理模型的可行性。我们已在https://github.com/netease-youdao/Confucius3-Math开源模型与代码。
English
We introduce Confucius3-Math, an open-source large language model with 14B
parameters that (1) runs efficiently on a single consumer-grade GPU; (2)
achieves SOTA performances on a range of mathematical reasoning tasks,
outperforming many models with significantly larger sizes. In particular, as
part of our mission to enhancing education and knowledge dissemination with AI,
Confucius3-Math is specifically committed to mathematics learning for Chinese
K-12 students and educators. Built via post-training with large-scale
reinforcement learning (RL), Confucius3-Math aligns with national curriculum
and excels at solving main-stream Chinese K-12 mathematical problems with low
cost. In this report we share our development recipe, the challenges we
encounter and the techniques we develop to overcome them. In particular, we
introduce three technical innovations: Targeted Entropy Regularization, Recent
Sample Recovery and Policy-Specific Hardness Weighting. These innovations
encompass a new entropy regularization, a novel data scheduling policy, and an
improved group-relative advantage estimator. Collectively, they significantly
stabilize the RL training, improve data efficiency, and boost performance. Our
work demonstrates the feasibility of building strong reasoning models in a
particular domain at low cost. We open-source our model and code at
https://github.com/netease-youdao/Confucius3-Math.