ChatPaper.aiChatPaper

大型推理模型尚未成为多语言潜在推理者

Large Reasoning Models Are (Not Yet) Multilingual Latent Reasoners

January 6, 2026
作者: Yihong Liu, Raoyuan Zhao, Hinrich Schütze, Michael A. Hedderich
cs.AI

摘要

大型推理模型在数学推理任务上表现出色,这通常归因于其生成显性思维链解释的能力。然而最新研究表明,模型往往在完成文本推理步骤前就已得出正确答案,表明存在潜在推理——即隐藏状态中编码的内部非语言计算。虽然该现象在英语领域已有探索,但其多语言特性仍属未知。本文通过截断策略对11种语言的多语言潜在推理进行系统研究,通过观察模型仅获得部分推理痕迹时正确答案的浮现过程,实现了对潜在预测形成的逐步骤测量。研究结果清晰揭示了多语言潜在推理的存在,但呈现不均衡性:资源丰富语言表现强劲,低资源语言较弱,且在难度更高的基准测试中普遍难以观测。为探究这些差异是否反映不同的内部机制,我们进一步进行了表征分析。尽管存在表层差异,但预测的内部演化在跨语言间高度一致,且与英语模式基本吻合——这一规律暗示着存在以英语为中心的潜在推理路径。
English
Large reasoning models (LRMs) achieve strong performance on mathematical reasoning tasks, often attributed to their capability to generate explicit chain-of-thought (CoT) explanations. However, recent work shows that LRMs often arrive at the correct answer before completing these textual reasoning steps, indicating the presence of latent reasoning -- internal, non-verbal computation encoded in hidden states. While this phenomenon has been explored in English, its multilingual behavior remains largely unknown. In this paper, we conduct a systematic investigation of multilingual latent reasoning in LRMs across 11 languages. Using a truncation-based strategy, we examine how the correct answer emerges as the model is given only partial reasoning traces, allowing us to measure stepwise latent prediction formation. Our results reveal clear evidence of multilingual latent reasoning, though unevenly: strong in resource-rich languages, weaker in low-resource ones, and broadly less observable on harder benchmarks. To understand whether these differences reflect distinct internal mechanisms, we further perform representational analyses. Despite surface-level disparities, we find that the internal evolution of predictions is highly consistent across languages and broadly aligns with English -- a pattern suggesting an English-centered latent reasoning pathway.
PDF11January 8, 2026