系统1.5推理：在语言与潜在空间中通过动态捷径进行遍历

摘要

链式思维（CoT）推理使大型语言模型（LLMs）能够超越快速的系统1响应，进入深思熟虑的系统2推理模式。然而，这一过程因冗长的中间输出而显著降低了效率。近期的潜在空间推理方法通过直接在隐藏状态上操作而不解码成语言，提升了效率，但它们对所有步骤一视同仁，未能区分关键推理与辅助步骤，导致计算资源利用不佳。本文提出系统1.5推理，一种自适应推理框架，通过在潜在空间中的快捷路径动态分配计算资源至各推理步骤。具体而言，系统1.5推理引入了两种动态快捷方式：模型深度快捷（DS）沿垂直深度自适应推理，通过轻量级适配器分支提前退出非关键令牌，同时允许关键令牌继续通过更深层的Transformer层；步骤快捷（SS）则在解码步骤间重用隐藏状态，跳过平凡步骤，在潜在空间中进行横向推理。训练系统1.5推理涉及两阶段自蒸馏过程：首先将自然语言CoT蒸馏为潜在空间的连续思维，随后将完整路径的系统2潜在推理蒸馏为自适应快捷路径（系统1.5推理）。在推理任务上的实验验证了本方法的优越性能。例如，在GSM8K数据集上，系统1.5推理实现了与传统CoT微调方法相当的推理性能，同时推理速度提升超过20倍，平均减少92.31%的令牌生成。

English

Chain-of-thought (CoT) reasoning enables large language models (LLMs) to move beyond fast System-1 responses and engage in deliberative System-2 reasoning. However, this comes at the cost of significant inefficiency due to verbose intermediate output. Recent latent-space reasoning methods improve efficiency by operating on hidden states without decoding into language, yet they treat all steps uniformly, failing to distinguish critical deductions from auxiliary steps and resulting in suboptimal use of computational resources. In this paper, we propose System-1.5 Reasoning, an adaptive reasoning framework that dynamically allocates computation across reasoning steps through shortcut paths in latent space. Specifically, System-1.5 Reasoning introduces two types of dynamic shortcuts. The model depth shortcut (DS) adaptively reasons along the vertical depth by early exiting non-critical tokens through lightweight adapter branches, while allowing critical tokens to continue through deeper Transformer layers. The step shortcut (SS) reuses hidden states across the decoding steps to skip trivial steps and reason horizontally in latent space. Training System-1.5 Reasoning involves a two-stage self-distillation process: first distilling natural language CoT into latent-space continuous thought, and then distilling full-path System-2 latent reasoning into adaptive shortcut paths (System-1.5 Reasoning). Experiments on reasoning tasks demonstrate the superior performance of our method. For example, on GSM8K, System-1.5 Reasoning achieves reasoning performance comparable to traditional CoT fine-tuning methods while accelerating inference by over 20x and reducing token generation by 92.31% on average.

系统1.5推理：在语言与潜在空间中通过动态捷径进行遍历

System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts

摘要

Support