기하적 잠재 추론은 대규모 언어 모델에서 더 짧은 생성을 유도한다

초록

대규모 언어 모델은 명시적 추론 토큰의 긴 사슬을 생성하여 복잡한 문제를 해결합니다. 효과적이지만, 이로 인해 추론 비용이 높아지고 길이에 민감해지며 (이산적) 자연어로 제한됩니다. 잠재 추론은 연속적 대안을 제공하지만, 중간 잠재 상태에 유용한 구조를 결정하는 것은 여전히 해결되지 않은 과제입니다. 본 논문에서는 모델의 사전 훈련된 토큰 임베딩 공간 내에서 잠재 추론을 기하학적 경로 근사 문제로 정식화합니다. 우리는 경량 전이 헤드를 사용하여 임베딩 공간에서 반복적 방향 업데이트를 예측하는 기하학적 잠재 추론(Geometric Latent Reasoning, GLR)을 소개합니다. GLR은 텍스트 기반 사고 사슬 추적을 앵커로 활용하여, 정확한 토큰 임베딩으로부터의 연속적 편차를 허용하면서 이산적 추론 궤적을 근사하는 방법을 학습합니다. Qwen3 모델을 사용한 수학적 추론 벤치마크 평가는 흥미로운 창발 현상을 보여줍니다. 기하학적 잠재 추론은 명시적 길이 목표 없이도 생성 길이를 현저히 단축시킵니다. 초기 명시적 추론을 연속적 잠재 단계로 대체함으로써, 모델은 전체 생성 단계를 훨씬 적게 사용하여 정답에 도달하는 경우가 많습니다. 이러한 발견은 연속적 궤적이 간결한 중간 추론 상태로 작용하여 잠재 계산 예산, 출력 길이 및 정확도 사이의 새로운 상충 관계를 드러냄을 시사합니다.

English

Large language models solve complex problems by generating lengthy chains of explicit reasoning tokens. While effective, this makes reasoning expensive, length-sensitive, and constrained to (discrete) natural language. While latent reasoning offers a continuous alternative, determining useful structures for intermediate latent states is an open challenge. In this paper, we formulate latent reasoning as a geometric path-approximation problem within the model's pretrained token-embedding space. We introduce Geometric Latent Reasoning (GLR), which uses a lightweight transition head to predict iterative direction updates in embedding space. Using textual chain-of-thought traces as anchors, GLR learns to approximate discrete reasoning trajectories while permitting continuous deviations from exact token embeddings. Evaluations on mathematical reasoning benchmarks using Qwen3 models reveal an emergent phenomenon: geometric latent reasoning induces substantially shorter generations without an explicit length objective. By replacing early explicit reasoning with continuous latent steps, models often reach correct answers using substantially fewer total generation steps. These findings suggest that continuous trajectories act as compact intermediate reasoning states, exposing a new tradeoff between latent computation budget, output length, and accuracy.