SIM-CoT：監督式隱性思維鏈

摘要

隱式思維鏈（CoT）方法為大型語言模型（LLMs）中的顯式CoT推理提供了一種具有前景且高效的替代方案，但其持續存在的性能差距限制了隱式CoT的應用。我們通過擴展隱式CoT方法的計算預算，發現了一個核心的潛在不穩定性問題：當我們增加隱式推理的token數量以提升性能時，訓練過程往往變得不穩定並崩潰。我們的分析表明，這種不穩定性源於潛在表徵變得同質化並失去語義多樣性，這是現有隱式CoT方法中步驟級監督不足所導致的失敗。為解決這一問題，我們提出了SIM-CoT，這是一個即插即用的訓練模塊，通過引入步驟級監督來穩定並豐富潛在推理空間。具體而言，SIM-CoT在訓練期間使用一個輔助解碼器，將每個隱式token與其對應的顯式推理步驟對齊，確保潛在狀態捕捉到獨特且有意義的信息。所提出的輔助解碼器在推理階段被移除，保持了隱式CoT方法的計算效率，且不增加額外開銷。此外，輔助解碼器通過將每個潛在token映射到顯式推理詞彙表，提供了隱式推理的可解釋性，實現了語義角色的逐步可視化與診斷。SIM-CoT顯著提升了各種隱式CoT方法的域內準確性和域外穩定性，例如在GPT-2上將Coconut基線提升了+8.2%，在LLaMA-3.1 8B上將CODI提升了+3.0%。SIM-CoT展示了強大的可擴展性，在GPT-2上以2.3倍的token效率超越了顯式CoT基線2.1%，同時在LLaMA-3.1 8B等更大模型上大幅縮小了性能差距。

English

Implicit Chain-of-Thought (CoT) methods present a promising, token-efficient alternative to explicit CoT reasoning in Large Language Models (LLMs), but a persistent performance gap has limited the application of implicit CoT. We identify a core latent instability issue by scaling the computational budget of implicit CoT approaches: as we increase the number of implicit reasoning tokens to enhance performance, the training process often becomes unstable and collapses. Our analysis reveals that this instability arises from the latent representations becoming homogeneous and losing their semantic diversity, a failure caused by insufficient step-level supervision in existing implicit CoT approaches. To address this issue, we propose SIM-CoT, a plug-and-play training module that introduces step-level supervision to stabilize and enrich the latent reasoning space. Specifically, SIM-CoT employs an auxiliary decoder during training to align each implicit token with its corresponding explicit reasoning step, ensuring that latent states capture distinct and meaningful information. The proposed auxiliary decoder is removed during inference, preserving the computational efficiency of implicit CoT methods with no added overhead. In addition, the auxiliary decoder affords interpretability of implicit reasoning by projecting each latent token onto an explicit reasoning vocabulary, enabling per-step visualization of semantic roles and diagnosis. SIM-CoT significantly enhances both the in-domain accuracy and out-of-domain stability of various implicit CoT methods, boosting baselines like Coconut by +8.2% on GPT-2 and CODI by +3.0% on LLaMA-3.1 8B. Demonstrating strong scalability, SIM-CoT also surpasses the explicit CoT baseline on GPT-2 by 2.1% with 2.3\times greater token efficiency, while substantially closing the performance gap on larger models like LLaMA-3.1 8B.

SIM-CoT：監督式隱性思維鏈

SIM-CoT: Supervised Implicit Chain-of-Thought

摘要

Support