ChatPaper.aiChatPaper

系統1.5推理:在語言與潛在空間中的動態捷徑遍歷

System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts

May 25, 2025
作者: Xiaoqiang Wang, Suyuchen Wang, Yun Zhu, Bang Liu
cs.AI

摘要

鏈式思維(CoT)推理使大型語言模型(LLMs)能夠超越快速的系統1反應,並參與深思熟慮的系統2推理。然而,這是以顯著的效率低下為代價的,因為中間輸出冗長。最近的潛在空間推理方法通過在隱藏狀態上操作而不解碼成語言來提高效率,但它們均等地對待所有步驟,未能區分關鍵推論與輔助步驟,導致計算資源的使用不盡理想。本文提出系統1.5推理,這是一種自適應推理框架,通過潛在空間中的捷徑路徑動態分配計算資源於推理步驟之間。具體而言,系統1.5推理引入了兩種類型的動態捷徑。模型深度捷徑(DS)沿垂直深度自適應推理,通過輕量級適配器分支提前退出非關鍵詞彙,同時允許關鍵詞彙繼續通過更深的Transformer層。步驟捷徑(SS)在解碼步驟間重用隱藏狀態,以跳過平凡步驟並在潛在空間中水平推理。訓練系統1.5推理涉及兩階段的自蒸餾過程:首先將自然語言CoT蒸餾成潛在空間的連續思維,然後將全路徑系統2潛在推理蒸餾成自適應捷徑路徑(系統1.5推理)。在推理任務上的實驗證明了我們方法的優越性能。例如,在GSM8K上,系統1.5推理實現了與傳統CoT微調方法相當的推理性能,同時加速推理超過20倍,並平均減少92.31%的詞彙生成。
English
Chain-of-thought (CoT) reasoning enables large language models (LLMs) to move beyond fast System-1 responses and engage in deliberative System-2 reasoning. However, this comes at the cost of significant inefficiency due to verbose intermediate output. Recent latent-space reasoning methods improve efficiency by operating on hidden states without decoding into language, yet they treat all steps uniformly, failing to distinguish critical deductions from auxiliary steps and resulting in suboptimal use of computational resources. In this paper, we propose System-1.5 Reasoning, an adaptive reasoning framework that dynamically allocates computation across reasoning steps through shortcut paths in latent space. Specifically, System-1.5 Reasoning introduces two types of dynamic shortcuts. The model depth shortcut (DS) adaptively reasons along the vertical depth by early exiting non-critical tokens through lightweight adapter branches, while allowing critical tokens to continue through deeper Transformer layers. The step shortcut (SS) reuses hidden states across the decoding steps to skip trivial steps and reason horizontally in latent space. Training System-1.5 Reasoning involves a two-stage self-distillation process: first distilling natural language CoT into latent-space continuous thought, and then distilling full-path System-2 latent reasoning into adaptive shortcut paths (System-1.5 Reasoning). Experiments on reasoning tasks demonstrate the superior performance of our method. For example, on GSM8K, System-1.5 Reasoning achieves reasoning performance comparable to traditional CoT fine-tuning methods while accelerating inference by over 20x and reducing token generation by 92.31% on average.

Summary

AI-Generated Summary

PDF122May 30, 2025