문장별로 예측해 봅시다

초록

자동회귀 언어 모델(LMs)은 한 번에 하나의 토큰을 생성하지만, 인간의 사고는 더 높은 수준의 추상화 - 문장, 명제, 개념 - 에서 작동합니다. 이러한 대조는 중심적인 질문을 제기합니다: LMs도 마찬가지로 원시 토큰 시퀀스가 아닌 구조화된 의미 단위를 통해 추론할 수 있을까요? 본 연구에서는 사전 학습된 LMs가 학습된 표현을 기반으로 이러한 추상적 추론 공간으로 전환될 수 있는지 조사합니다. 우리는 사전 학습된 토큰 수준의 LM을 문장 공간에서 작동하도록 적응시키는 프레임워크를 제시하며, 이는 다음 문장의 연속적 임베딩을 자동회귀적으로 예측합니다. 우리는 고전적 표현 학습에서 영감을 받은 두 가지 임베딩 패러다임을 탐구합니다: 1) 표면 의미를 보존하기 위해 자동 인코딩을 통해 학습된 의미 임베딩; 2) 예측적 구조를 인코딩하기 위해 다음 문장 예측을 통해 학습된 문맥 임베딩. 우리는 두 가지 추론 체계에서 이를 평가합니다: 각 예측된 임베딩을 텍스트로 디코딩한 후 다시 인코딩하는 이산화(Discretized) 방식과, 효율성을 개선하기 위해 임베딩 공간에서 완전히 추론하는 연속(Continuous) 방식. 수학, 논리, 상식, 계획 등 네 가지 영역에서 연속 추론 체계의 문맥 임베딩은 Chain-of-Thought(CoT)와 경쟁력 있는 성능을 보이면서 평균적으로 추론 시간 FLOPs를 절반으로 줄였습니다. 또한 확장성과 모듈식 적응의 초기 징후를 제시합니다. 마지막으로, 잠재 궤적을 시각화하기 위해 중간 모델 상태를 해석 가능한 문장으로 디코딩하는 진단 도구인 SentenceLens를 소개합니다. 종합적으로, 우리의 결과는 사전 학습된 LMs가 잠재 임베딩 공간 내에서 추상적이고 구조화된 추론으로 효과적으로 전환할 수 있음을 나타냅니다.

English

Autoregressive language models (LMs) generate one token at a time, yet human reasoning operates over higher-level abstractions - sentences, propositions, and concepts. This contrast raises a central question- Can LMs likewise learn to reason over structured semantic units rather than raw token sequences? In this work, we investigate whether pretrained LMs can be lifted into such abstract reasoning spaces by building on their learned representations. We present a framework that adapts a pretrained token-level LM to operate in sentence space by autoregressively predicting continuous embeddings of next sentences. We explore two embedding paradigms inspired by classical representation learning: 1) semantic embeddings, learned via autoencoding to preserve surface meaning; and 2) contextual embeddings, trained via next-sentence prediction to encode anticipatory structure. We evaluate both under two inference regimes: Discretized, which decodes each predicted embedding into text before re-encoding; and Continuous, which reasons entirely in embedding space for improved efficiency. Across four domains - mathematics, logic, commonsense, and planning - contextual embeddings under continuous inference show competitive performance with Chain-of-Thought (CoT) while reducing inference-time FLOPs on average by half. We also present early signs of scalability and modular adaptation. Finally, to visualize latent trajectories, we introduce SentenceLens, a diagnostic tool that decodes intermediate model states into interpretable sentences. Together, our results indicate that pretrained LMs can effectively transition to abstract, structured reasoning within latent embedding spaces.

문장별로 예측해 봅시다

Let's Predict Sentence by Sentence

초록

Support