幾何潛在推理誘發大語言模型中更短的生成

摘要

大型语言模型通过生成长串显式推理词元来解决复杂问题。这种方法虽有效，却导致推理成本高昂、对长度敏感，且受限于（离散化的）自然语言。尽管潜变量推理提供了连续的替代方案，但如何为中间潜状态确定有用的结构仍是一项开放挑战。本文将潜变量推理形式化为模型预训练词元嵌入空间中的几何路径逼近问题。我们提出几何潜变量推理（GLR），该方法利用轻量级过渡头在嵌入空间中预测迭代方向更新。以文本形式的思维链轨迹作为锚点，GLR学习逼近离散推理轨迹，同时允许偏离精确词元嵌入的连续偏差。使用Qwen3模型在数学推理基准上的评估揭示了一种涌现现象：几何潜变量推理无需显式的长度目标便能诱导出大幅缩短的生成文本。通过用连续潜步替代早期显式推理，模型往往能用显著更少的总生成步数得出正确答案。这些发现表明，连续轨迹作为紧凑的中间推理状态，揭示了潜计算预算、输出长度与准确度之间的一种新权衡。

English

Large language models solve complex problems by generating lengthy chains of explicit reasoning tokens. While effective, this makes reasoning expensive, length-sensitive, and constrained to (discrete) natural language. While latent reasoning offers a continuous alternative, determining useful structures for intermediate latent states is an open challenge. In this paper, we formulate latent reasoning as a geometric path-approximation problem within the model's pretrained token-embedding space. We introduce Geometric Latent Reasoning (GLR), which uses a lightweight transition head to predict iterative direction updates in embedding space. Using textual chain-of-thought traces as anchors, GLR learns to approximate discrete reasoning trajectories while permitting continuous deviations from exact token embeddings. Evaluations on mathematical reasoning benchmarks using Qwen3 models reveal an emergent phenomenon: geometric latent reasoning induces substantially shorter generations without an explicit length objective. By replacing early explicit reasoning with continuous latent steps, models often reach correct answers using substantially fewer total generation steps. These findings suggest that continuous trajectories act as compact intermediate reasoning states, exposing a new tradeoff between latent computation budget, output length, and accuracy.