每激活一次,推理能力躍升:將通用推理模型擴展至萬億級開放語言基座
Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation
October 25, 2025
作者: Ling-Team, Ang Li, Ben Liu, Binbin Hu, Bing Li, Bingwei Zeng, Borui Ye, Caizhi Tang, Changxin Tian, Chao Huang, Chao Zhang, Chen Qian, Chenchen Ju, Chenchen Li, Chengfu Tang, Chili Fu, Chunshao Ren, Chunwei Wu, Cong Zhang, Cunyin Peng, Dafeng Xu, Daixin Wang, Dalong Zhang, Dingnan Jin, Dingyuan Zhu, Dongke Hu, Fangzheng Zhao, Feifan Wu, Feng Zhu, Gangshan Wang, Haitao Zhang, Hailin Zhao, Hanxiao Zhang, Hanzi Wang, Hao Qian, Haoyi Yu, Heng Zhang, Hongliang Zhang, Hongzhi Luan, Huirong Dong, Huizhong Li, Jia Li, Jia Liu, Jialong Zhu, Jian Sha, Jianping Wei, Jiaolong Yang, Jieyue Ma, Jiewei Wu, Jinjing Huang, Jingyun Tian, Jingyuan Zhang, Jinquan Sun, Juanhui Tu, Jun Liu, Jun Xu, Jun Zhou, Junjie Ou, Junpeng Fang, Kaihong Zhang, Kaiqin Hu, Ke Shi, Kun Tang, Kunlong Chen, Lanyin Mei, Lei Liang, Lei Xu, Libo Zhang, Lin Ju, Lin Yuan, Ling Zhong, Lintao Ma, Lu Liu, Lu Yu, Lun Cai, Meiqi Zhu, Mengying Li, Min Chen, Minghao Xue, Minghong Cai, Mingming Yin, Peijie Jiang, Peilong Zhao, Pingping Liu, Qian Zhao, Qing Cui, Qingxiang Huang, Qingyuan Yang, Quankun Yu, Shaowei Wei, Shijie Lian, Shoujian Zheng, Shun Song, Shungen Zhang, Shuo Zhang, Siyuan Li, Song Liu, Ting Guo, Tong Zhao, Wanli Gu, Weichang Wu, Weiguang Han, Wenjing Fang, Wubin Wang, Xiang Shu, Xiao Shi, Xiaoshun Lan, Xiaolu Zhang, Xiaqing Sun, Xin Zhao, Xingyu Lu, Xiong Xu, Xudong Wang, Xudong Wang, Xuemin Yang, Yajie Yang, Yang Xiang, Yanzhe Li, Yi Zhang, Yilong Wang, Yingxue Li, Yongzhen Guo, Yuzhuo Fu, Yuanyuan Wang, Yue Yang, Yue Yu, Yufeng Deng, Yun Zhang, Yunfei Xu, Yuqi Zhang, Yuxiao He, Zengke Gui, Zhaoxin Huan, Zhaoyang Wang, Zhibo Zhu, Zhihao Wang, Zhiqiang Zhang, Zhoufei Wang, Zihang Zeng, Ziqi Liu, Zitao Xuan, Zuoli Tang
cs.AI
摘要
我們推出 Ling 2.0——一個基於「每次激活皆提升推理能力」原則構建的系列化推理導向語言基礎模型。該系列在統一混合專家模型框架下,設計參數規模可從數百億擴展至一萬億,並憑藉實證縮放定律實現高稀疏度、跨尺度一致性與高效能特性。系列包含三款非思維(指令)模型:Ling-mini-2.0、Ling-flash-2.0 與 Ling-1T,總參數規模從160億至1萬億,相比稠密模型實現最高7倍激活計算效率。Ling 2.0 整合了模型架構、預訓練、後訓練及基礎設施的協同創新:採用具備多令牌預測的高稀疏MoE架構以實現高效推理,融合推理導向數據與訓練中期思維鏈激活,實施基於強化的精調技術(DFT、Evo-CoT),並實現全尺度FP8訓練與細粒度異構流水線。在萬億參數級別,Ling-1T 建立了推理準確率與計算效率的新帕累托前沿,證明稀疏激活機制與推理目標精準對齊時,可實現可擴展的高效智能。整體而言,Ling 2.0 為推進未來推理與思維模型(包括基於同底座的Ring系列)提供了連貫、開放且高效的基礎框架。
English
We introduce Ling 2.0, a series reasoning-oriented language foundation built
upon the principle that every activation boosts reasoning capability. Designed
to scale from tens of billions to one trillion parameters under a unified
Mixture-of-Experts (MoE) paradigm, Ling 2.0 emphasizes high sparsity,
cross-scale consistency, and efficiency guided by empirical scaling laws. The
series includes three non-thinking (instruct) models - Ling-mini-2.0,
Ling-flash-2.0, and Ling-1T - ranging from 16B to 1T total parameters and
achieving up to 7-fold active-compute efficiency compared with dense
counterparts. Ling 2.0 integrates coordinated innovations across model
architecture, pre-training, post-training, and infrastructure: a high-sparsity
MoE with MTP for efficient reasoning, reasoning-oriented data and mid-training
CoT activation, reinforcement-based fine-tuning (DFT, Evo-CoT), and full-scale
FP8 training with fine-grained heterogeneous pipelines. At the trillion scale,
Ling-1T establishes a new Pareto frontier of reasoning accuracy versus
computational efficiency, demonstrating that sparse activation, when properly
aligned with reasoning objectives, enables scalable and efficient intelligence.
Collectively, Ling 2.0 provides a coherent, open, and efficient foundation for
advancing future reasoning and thinking models, including the Ring series built
upon the same base.