시스템-1.5 추론: 동적 단축 경로를 활용한 언어 및 잠재 공간 탐색

초록

사고의 연쇄(Chain-of-thought, CoT) 추론은 대규모 언어 모델(LLMs)이 빠른 System-1 응답을 넘어 숙고적인 System-2 추론을 수행할 수 있도록 합니다. 그러나 이는 장황한 중간 출력으로 인해 상당한 비효율성을 초래합니다. 최근의 잠재 공간(latent-space) 추론 방법은 언어로 디코딩하지 않고 숨겨진 상태에서 작동함으로써 효율성을 개선하지만, 모든 단계를 균일하게 처리하여 중요한 추론과 보조 단계를 구분하지 못하고, 이로 인해 계산 자원의 최적 사용이 이루어지지 않습니다. 본 논문에서는 잠재 공간 내의 단축 경로를 통해 추론 단계 전반에 걸쳐 계산을 동적으로 할당하는 적응형 추론 프레임워크인 System-1.5 Reasoning을 제안합니다. 구체적으로, System-1.5 Reasoning은 두 가지 유형의 동적 단축 경로를 도입합니다. 모델 깊이 단축 경로(DS)는 경량 어댑터 분기를 통해 비중요 토큰을 조기에 종료함으로써 수직 깊이를 따라 적응적으로 추론하는 동시에, 중요한 토큰이 더 깊은 Transformer 층을 통해 계속 진행하도록 합니다. 단계 단축 경로(SS)는 디코딩 단계 전반에 걸쳐 숨겨진 상태를 재사용하여 사소한 단계를 건너뛰고 잠재 공간에서 수평적으로 추론합니다. System-1.5 Reasoning의 학습은 두 단계의 자기 증류(self-distillation) 과정을 포함합니다: 먼저 자연어 CoT를 잠재 공간의 연속적 사고로 증류한 다음, 전체 경로 System-2 잠재 추론을 적응형 단축 경로(System-1.5 Reasoning)로 증류합니다. 추론 작업에 대한 실험은 본 방법의 우수한 성능을 입증합니다. 예를 들어, GSM8K에서 System-1.5 Reasoning은 기존 CoT 미세 조정 방법과 비슷한 추론 성능을 달성하면서도 추론 속도를 20배 이상 가속화하고 토큰 생성을 평균 92.31% 감소시켰습니다.

English

Chain-of-thought (CoT) reasoning enables large language models (LLMs) to move beyond fast System-1 responses and engage in deliberative System-2 reasoning. However, this comes at the cost of significant inefficiency due to verbose intermediate output. Recent latent-space reasoning methods improve efficiency by operating on hidden states without decoding into language, yet they treat all steps uniformly, failing to distinguish critical deductions from auxiliary steps and resulting in suboptimal use of computational resources. In this paper, we propose System-1.5 Reasoning, an adaptive reasoning framework that dynamically allocates computation across reasoning steps through shortcut paths in latent space. Specifically, System-1.5 Reasoning introduces two types of dynamic shortcuts. The model depth shortcut (DS) adaptively reasons along the vertical depth by early exiting non-critical tokens through lightweight adapter branches, while allowing critical tokens to continue through deeper Transformer layers. The step shortcut (SS) reuses hidden states across the decoding steps to skip trivial steps and reason horizontally in latent space. Training System-1.5 Reasoning involves a two-stage self-distillation process: first distilling natural language CoT into latent-space continuous thought, and then distilling full-path System-2 latent reasoning into adaptive shortcut paths (System-1.5 Reasoning). Experiments on reasoning tasks demonstrate the superior performance of our method. For example, on GSM8K, System-1.5 Reasoning achieves reasoning performance comparable to traditional CoT fine-tuning methods while accelerating inference by over 20x and reducing token generation by 92.31% on average.

시스템-1.5 추론: 동적 단축 경로를 활용한 언어 및 잠재 공간 탐색

System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts

초록

Support