システム1.5推論：動的ショートカットを用いた言語空間と潜在空間のトラバーサル

要旨

チェーン・オブ・ソート（CoT）推論は、大規模言語モデル（LLM）が迅速なSystem-1応答を超えて、熟考型のSystem-2推論に取り組むことを可能にする。しかし、これには冗長な中間出力による著しい非効率性が伴う。最近の潜在空間推論手法は、言語にデコードせずに隠れ状態で操作することで効率を向上させるが、すべてのステップを均一に扱い、重要な推論と補助的なステップを区別せず、計算リソースの最適な利用を妨げている。本論文では、潜在空間におけるショートカット経路を通じて推論ステップ間で計算を動的に割り当てる適応型推論フレームワークであるSystem-1.5推論を提案する。具体的には、System-1.5推論は2種類の動的ショートカットを導入する。モデル深度ショートカット（DS）は、軽量なアダプタ分岐を通じて非重要なトークンを早期に終了させながら、重要なトークンがより深いTransformer層を通過することを可能にし、垂直方向の深度に沿って適応的に推論する。ステップショートカット（SS）は、デコードステップ間で隠れ状態を再利用し、些細なステップをスキップして潜在空間で水平方向に推論する。System-1.5推論のトレーニングは、2段階の自己蒸留プロセスを含む：まず自然言語CoTを潜在空間の連続思考に蒸留し、次に完全経路のSystem-2潜在推論を適応型ショートカット経路（System-1.5推論）に蒸留する。推論タスクにおける実験は、本手法の優れた性能を示している。例えば、GSM8Kにおいて、System-1.5推論は従来のCoTファインチューニング手法と同等の推論性能を達成しつつ、推論速度を20倍以上加速し、トークン生成を平均92.31%削減する。

English

Chain-of-thought (CoT) reasoning enables large language models (LLMs) to move beyond fast System-1 responses and engage in deliberative System-2 reasoning. However, this comes at the cost of significant inefficiency due to verbose intermediate output. Recent latent-space reasoning methods improve efficiency by operating on hidden states without decoding into language, yet they treat all steps uniformly, failing to distinguish critical deductions from auxiliary steps and resulting in suboptimal use of computational resources. In this paper, we propose System-1.5 Reasoning, an adaptive reasoning framework that dynamically allocates computation across reasoning steps through shortcut paths in latent space. Specifically, System-1.5 Reasoning introduces two types of dynamic shortcuts. The model depth shortcut (DS) adaptively reasons along the vertical depth by early exiting non-critical tokens through lightweight adapter branches, while allowing critical tokens to continue through deeper Transformer layers. The step shortcut (SS) reuses hidden states across the decoding steps to skip trivial steps and reason horizontally in latent space. Training System-1.5 Reasoning involves a two-stage self-distillation process: first distilling natural language CoT into latent-space continuous thought, and then distilling full-path System-2 latent reasoning into adaptive shortcut paths (System-1.5 Reasoning). Experiments on reasoning tasks demonstrate the superior performance of our method. For example, on GSM8K, System-1.5 Reasoning achieves reasoning performance comparable to traditional CoT fine-tuning methods while accelerating inference by over 20x and reducing token generation by 92.31% on average.

システム1.5推論：動的ショートカットを用いた言語空間と潜在空間のトラバーサル

System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts

要旨

Support