Геометрическое латентное рассуждение приводит к более коротким генерациям в LLM

Аннотация

大型语言模型通过生成冗长的显式推理令牌链来解决复杂问题。尽管这种方法有效，但它使得推理过程成本高昂、对长度敏感，并且局限于（离散的）自然语言。虽然潜在推理提供了一种连续的替代方案，但确定中间潜在状态的有用结构仍是一个悬而未决的挑战。本文中，我们将潜在推理形式化为模型预训练令牌嵌入空间内的几何路径逼近问题。我们提出了几何潜在推理（GLR），该方法使用轻量级转移头来预测嵌入空间中的迭代方向更新。以文本思维链轨迹为锚点，GLR学习逼近离散推理轨迹，同时允许偏离精确令牌嵌入的连续偏差。在使用Qwen3模型进行的数学推理基准评估中，我们发现了一种涌现现象：几何潜在推理在不设置显式长度目标的情况下，显著缩短了生成内容。通过用连续潜在步骤替代早期显式推理，模型通常能以更少的总生成步骤得出正确答案。这些发现表明，连续轨迹充当了紧凑的中间推理状态，揭示了潜在计算预算、输出长度和准确性之间的新权衡关系。

English

Large language models solve complex problems by generating lengthy chains of explicit reasoning tokens. While effective, this makes reasoning expensive, length-sensitive, and constrained to (discrete) natural language. While latent reasoning offers a continuous alternative, determining useful structures for intermediate latent states is an open challenge. In this paper, we formulate latent reasoning as a geometric path-approximation problem within the model's pretrained token-embedding space. We introduce Geometric Latent Reasoning (GLR), which uses a lightweight transition head to predict iterative direction updates in embedding space. Using textual chain-of-thought traces as anchors, GLR learns to approximate discrete reasoning trajectories while permitting continuous deviations from exact token embeddings. Evaluations on mathematical reasoning benchmarks using Qwen3 models reveal an emergent phenomenon: geometric latent reasoning induces substantially shorter generations without an explicit length objective. By replacing early explicit reasoning with continuous latent steps, models often reach correct answers using substantially fewer total generation steps. These findings suggest that continuous trajectories act as compact intermediate reasoning states, exposing a new tradeoff between latent computation budget, output length, and accuracy.