几何潜在推理促使大语言模型生成更短的输出

摘要

大型语言模型通过生成长链显式推理令牌来解决复杂问题。这种方法虽然有效，但导致推理成本高昂、对长度敏感，且受限于（离散）自然语言。尽管潜在推理提供了连续的替代方案，但如何确定中间潜在状态的有效结构仍是一个开放挑战。本文将潜在推理定义为模型预训练令牌嵌入空间中的几何路径逼近问题。我们提出几何潜在推理（GLR），该方法使用轻量级过渡头来预测嵌入空间中的迭代方向更新。GLR以文本思维链轨迹作为锚点，学习近似离散推理轨迹，同时允许与精确令牌嵌入存在连续偏差。在Qwen3模型的数学推理基准评估中，揭示了一个新兴现象：几何潜在推理能在无显式长度目标的情况下，诱导出大幅缩短的生成序列。通过用连续潜在步骤替代早期显式推理，模型常能以更少的总生成步骤到达正确答案。这些发现表明，连续轨迹可作为紧凑的中间推理状态，揭示了潜在计算预算、输出长度与准确率之间的新型权衡关系。

English

Large language models solve complex problems by generating lengthy chains of explicit reasoning tokens. While effective, this makes reasoning expensive, length-sensitive, and constrained to (discrete) natural language. While latent reasoning offers a continuous alternative, determining useful structures for intermediate latent states is an open challenge. In this paper, we formulate latent reasoning as a geometric path-approximation problem within the model's pretrained token-embedding space. We introduce Geometric Latent Reasoning (GLR), which uses a lightweight transition head to predict iterative direction updates in embedding space. Using textual chain-of-thought traces as anchors, GLR learns to approximate discrete reasoning trajectories while permitting continuous deviations from exact token embeddings. Evaluations on mathematical reasoning benchmarks using Qwen3 models reveal an emergent phenomenon: geometric latent reasoning induces substantially shorter generations without an explicit length objective. By replacing early explicit reasoning with continuous latent steps, models often reach correct answers using substantially fewer total generation steps. These findings suggest that continuous trajectories act as compact intermediate reasoning states, exposing a new tradeoff between latent computation budget, output length, and accuracy.