透過循環語言模型擴展潛在推理能力
Scaling Latent Reasoning via Looped Language Models
October 29, 2025
作者: Rui-Jie Zhu, Zixuan Wang, Kai Hua, Tianyu Zhang, Ziniu Li, Haoran Que, Boyi Wei, Zixin Wen, Fan Yin, He Xing, Lu Li, Jiajun Shi, Kaijing Ma, Shanda Li, Taylor Kergan, Andrew Smith, Xingwei Qu, Mude Hui, Bohong Wu, Qiyang Min, Hongzhi Huang, Xun Zhou, Wei Ye, Jiaheng Liu, Jian Yang, Yunfeng Shi, Chenghua Lin, Enduo Zhao, Tianle Cai, Ge Zhang, Wenhao Huang, Yoshua Bengio, Jason Eshraghian
cs.AI
摘要
當代大型語言模型主要透過顯性文本生成(如思維鏈)來實現「思考」功能,這種方式將推理過程延遲至訓練後階段,未能充分發揮預訓練數據的潛力。我們提出並開源了以遞迴符號「銜尾蛇」命名的Ouro模型系列——一種在預訓練階段就內建推理能力的循環語言模型,其核心創新包括:(i)潛空間中的迭代計算機制,(ii)基於熵正則化的自適應深度分配目標,以及(iii)7.7兆訓練詞元的規模化訓練。Ouro的1.4B與2.6B參數模型在廣泛基準測試中表現卓越,其性能可媲美當前最先進的120億參數模型。對照實驗表明,這種優勢並非源自知識容量的提升,而是來自更優越的知識操縱能力。我們同時驗證了循環語言模型產生的推理軌跡比顯性思維鏈更貼近最終輸出結果。本研究旨在展示循環語言模型作為推理時代新型規模化發展方向的潛力。模型獲取地址:http://ouro-llm.github.io。
English
Modern LLMs are trained to "think" primarily via explicit text generation,
such as chain-of-thought (CoT), which defers reasoning to post-training and
under-leverages pre-training data. We present and open-source Ouro, named after
the recursive Ouroboros, a family of pre-trained Looped Language Models
(LoopLM) that instead build reasoning into the pre-training phase through (i)
iterative computation in latent space, (ii) an entropy-regularized objective
for learned depth allocation, and (iii) scaling to 7.7T tokens. Ouro 1.4B and
2.6B models enjoy superior performance that match the results of up to 12B SOTA
LLMs across a wide range of benchmarks. Through controlled experiments, we show
this advantage stems not from increased knowledge capacity, but from superior
knowledge manipulation capabilities. We also show that LoopLM yields reasoning
traces more aligned with final outputs than explicit CoT. We hope our results
show the potential of LoopLM as a novel scaling direction in the reasoning era.
Our model could be found in: http://ouro-llm.github.io.