通过循环语言模型扩展潜在推理能力
Scaling Latent Reasoning via Looped Language Models
October 29, 2025
作者: Rui-Jie Zhu, Zixuan Wang, Kai Hua, Tianyu Zhang, Ziniu Li, Haoran Que, Boyi Wei, Zixin Wen, Fan Yin, He Xing, Lu Li, Jiajun Shi, Kaijing Ma, Shanda Li, Taylor Kergan, Andrew Smith, Xingwei Qu, Mude Hui, Bohong Wu, Qiyang Min, Hongzhi Huang, Xun Zhou, Wei Ye, Jiaheng Liu, Jian Yang, Yunfeng Shi, Chenghua Lin, Enduo Zhao, Tianle Cai, Ge Zhang, Wenhao Huang, Yoshua Bengio, Jason Eshraghian
cs.AI
摘要
当代大语言模型主要通过显式文本生成(如思维链)进行"思考"训练,这种将推理推迟至训练后阶段的方式未能充分利用预训练数据。我们提出并开源了以递归符号"衔尾蛇"命名的Ouro模型系列——一种预训练的循环语言模型(LoopLM),通过以下三项创新将推理能力构建于预训练阶段:(一)潜在空间的迭代计算,(二)基于熵正则化目标的学习深度分配机制,(三)规模扩展至7.7万亿训练词元。Ouro的14亿和26亿参数模型在广泛基准测试中表现出色,其性能可媲美当前最优的120亿参数大模型。受控实验表明,这种优势并非源于知识容量的提升,而是来自更卓越的知识操纵能力。我们还证明相较于显式思维链,LoopLM生成的推理轨迹与最终输出具有更高一致性。我们的研究成果昭示了循环语言模型作为推理时代新型扩展方向的潜力。模型获取地址:http://ouro-llm.github.io。
English
Modern LLMs are trained to "think" primarily via explicit text generation,
such as chain-of-thought (CoT), which defers reasoning to post-training and
under-leverages pre-training data. We present and open-source Ouro, named after
the recursive Ouroboros, a family of pre-trained Looped Language Models
(LoopLM) that instead build reasoning into the pre-training phase through (i)
iterative computation in latent space, (ii) an entropy-regularized objective
for learned depth allocation, and (iii) scaling to 7.7T tokens. Ouro 1.4B and
2.6B models enjoy superior performance that match the results of up to 12B SOTA
LLMs across a wide range of benchmarks. Through controlled experiments, we show
this advantage stems not from increased knowledge capacity, but from superior
knowledge manipulation capabilities. We also show that LoopLM yields reasoning
traces more aligned with final outputs than explicit CoT. We hope our results
show the potential of LoopLM as a novel scaling direction in the reasoning era.
Our model could be found in: http://ouro-llm.github.io.