SIM-CoT:监督式隐式思维链
SIM-CoT: Supervised Implicit Chain-of-Thought
September 24, 2025
作者: Xilin Wei, Xiaoran Liu, Yuhang Zang, Xiaoyi Dong, Yuhang Cao, Jiaqi Wang, Xipeng Qiu, Dahua Lin
cs.AI
摘要
隐式思维链(CoT)方法为大型语言模型(LLMs)中的显式CoT推理提供了一种极具前景且高效的替代方案,但其持续存在的性能差距限制了隐式CoT的应用。通过扩展隐式CoT方法的计算预算,我们发现了一个核心的潜在不稳定性问题:随着增加隐式推理标记以提升性能,训练过程往往变得不稳定并崩溃。我们的分析表明,这种不稳定性源于潜在表征趋于同质化并丧失语义多样性,这是现有隐式CoT方法中步骤级监督不足导致的失败。为解决这一问题,我们提出了SIM-CoT,一个即插即用的训练模块,通过引入步骤级监督来稳定并丰富潜在推理空间。具体而言,SIM-CoT在训练期间采用辅助解码器,将每个隐式标记与其对应的显式推理步骤对齐,确保潜在状态捕捉到独特且有意义的信息。所提出的辅助解码器在推理阶段被移除,保持了隐式CoT方法的计算效率,无额外开销。此外,辅助解码器通过将每个潜在标记投射到显式推理词汇表上,提供了隐式推理的可解释性,实现了语义角色的逐步骤可视化和诊断。SIM-CoT显著提升了多种隐式CoT方法的域内准确性和域外稳定性,在GPT-2上将Coconut基线提升了+8.2%,在LLaMA-3.1 8B上将CODI提升了+3.0%。展示了强大的可扩展性,SIM-CoT在GPT-2上以2.3倍的标记效率超越了显式CoT基线2.1%,同时在LLaMA-3.1 8B等更大模型上大幅缩小了性能差距。
English
Implicit Chain-of-Thought (CoT) methods present a promising, token-efficient
alternative to explicit CoT reasoning in Large Language Models (LLMs), but a
persistent performance gap has limited the application of implicit CoT. We
identify a core latent instability issue by scaling the computational budget of
implicit CoT approaches: as we increase the number of implicit reasoning tokens
to enhance performance, the training process often becomes unstable and
collapses. Our analysis reveals that this instability arises from the latent
representations becoming homogeneous and losing their semantic diversity, a
failure caused by insufficient step-level supervision in existing implicit CoT
approaches. To address this issue, we propose SIM-CoT, a plug-and-play training
module that introduces step-level supervision to stabilize and enrich the
latent reasoning space. Specifically, SIM-CoT employs an auxiliary decoder
during training to align each implicit token with its corresponding explicit
reasoning step, ensuring that latent states capture distinct and meaningful
information. The proposed auxiliary decoder is removed during inference,
preserving the computational efficiency of implicit CoT methods with no added
overhead. In addition, the auxiliary decoder affords interpretability of
implicit reasoning by projecting each latent token onto an explicit reasoning
vocabulary, enabling per-step visualization of semantic roles and diagnosis.
SIM-CoT significantly enhances both the in-domain accuracy and out-of-domain
stability of various implicit CoT methods, boosting baselines like Coconut by
+8.2% on GPT-2 and CODI by +3.0% on LLaMA-3.1 8B. Demonstrating strong
scalability, SIM-CoT also surpasses the explicit CoT baseline on GPT-2 by 2.1%
with 2.3\times greater token efficiency, while substantially closing the
performance gap on larger models like LLaMA-3.1 8B.