SIM-CoT：监督式隐式思维链

摘要

隐式思维链（CoT）方法为大型语言模型（LLMs）中的显式CoT推理提供了一种极具前景且高效的替代方案，但其持续存在的性能差距限制了隐式CoT的应用。通过扩展隐式CoT方法的计算预算，我们发现了一个核心的潜在不稳定性问题：随着增加隐式推理标记以提升性能，训练过程往往变得不稳定并崩溃。我们的分析表明，这种不稳定性源于潜在表征趋于同质化并丧失语义多样性，这是现有隐式CoT方法中步骤级监督不足导致的失败。为解决这一问题，我们提出了SIM-CoT，一个即插即用的训练模块，通过引入步骤级监督来稳定并丰富潜在推理空间。具体而言，SIM-CoT在训练期间采用辅助解码器，将每个隐式标记与其对应的显式推理步骤对齐，确保潜在状态捕捉到独特且有意义的信息。所提出的辅助解码器在推理阶段被移除，保持了隐式CoT方法的计算效率，无额外开销。此外，辅助解码器通过将每个潜在标记投射到显式推理词汇表上，提供了隐式推理的可解释性，实现了语义角色的逐步骤可视化和诊断。SIM-CoT显著提升了多种隐式CoT方法的域内准确性和域外稳定性，在GPT-2上将Coconut基线提升了+8.2%，在LLaMA-3.1 8B上将CODI提升了+3.0%。展示了强大的可扩展性，SIM-CoT在GPT-2上以2.3倍的标记效率超越了显式CoT基线2.1%，同时在LLaMA-3.1 8B等更大模型上大幅缩小了性能差距。

English

Implicit Chain-of-Thought (CoT) methods present a promising, token-efficient alternative to explicit CoT reasoning in Large Language Models (LLMs), but a persistent performance gap has limited the application of implicit CoT. We identify a core latent instability issue by scaling the computational budget of implicit CoT approaches: as we increase the number of implicit reasoning tokens to enhance performance, the training process often becomes unstable and collapses. Our analysis reveals that this instability arises from the latent representations becoming homogeneous and losing their semantic diversity, a failure caused by insufficient step-level supervision in existing implicit CoT approaches. To address this issue, we propose SIM-CoT, a plug-and-play training module that introduces step-level supervision to stabilize and enrich the latent reasoning space. Specifically, SIM-CoT employs an auxiliary decoder during training to align each implicit token with its corresponding explicit reasoning step, ensuring that latent states capture distinct and meaningful information. The proposed auxiliary decoder is removed during inference, preserving the computational efficiency of implicit CoT methods with no added overhead. In addition, the auxiliary decoder affords interpretability of implicit reasoning by projecting each latent token onto an explicit reasoning vocabulary, enabling per-step visualization of semantic roles and diagnosis. SIM-CoT significantly enhances both the in-domain accuracy and out-of-domain stability of various implicit CoT methods, boosting baselines like Coconut by +8.2% on GPT-2 and CODI by +3.0% on LLaMA-3.1 8B. Demonstrating strong scalability, SIM-CoT also surpasses the explicit CoT baseline on GPT-2 by 2.1% with 2.3\times greater token efficiency, while substantially closing the performance gap on larger models like LLaMA-3.1 8B.

SIM-CoT：监督式隐式思维链

SIM-CoT: Supervised Implicit Chain-of-Thought

摘要

Support