SIM-CoT: 教師あり暗黙的連鎖思考

要旨

暗黙的Chain-of-Thought（CoT）手法は、大規模言語モデル（LLM）における明示的CoT推論に対するトークン効率の良い有望な代替手段として注目されていますが、性能のギャップが続いており、暗黙的CoTの応用を制限してきました。我々は、暗黙的CoT手法の計算予算をスケーリングすることで、中核的な潜在的不安定性の問題を特定しました：性能を向上させるために暗黙的推論トークンの数を増やすと、トレーニングプロセスがしばしば不安定になり、崩壊してしまうのです。我々の分析によると、この不安定性は、潜在表現が均質化し、その意味的多様性を失うことから生じています。これは、既存の暗黙的CoT手法におけるステップレベルの監視が不十分であることに起因する失敗です。この問題を解決するために、我々はSIM-CoTを提案します。これは、潜在推論空間を安定化し、豊かにするためにステップレベルの監視を導入するプラグアンドプレイのトレーニングモジュールです。具体的には、SIM-CoTはトレーニング中に補助デコーダを使用して、各暗黙的トークンを対応する明示的推論ステップと整合させ、潜在状態が明確で意味のある情報を捕捉することを保証します。提案された補助デコーダは推論時に削除され、暗黙的CoT手法の計算効率を維持し、追加のオーバーヘッドを発生させません。さらに、補助デコーダは、各潜在トークンを明示的推論語彙に投影することで、暗黙的推論の解釈可能性を提供し、セマンティックロールのステップごとの可視化と診断を可能にします。SIM-CoTは、様々な暗黙的CoT手法のドメイン内精度とドメイン外安定性を大幅に向上させ、GPT-2におけるCoconutのベースラインを+8.2%、LLaMA-3.1 8BにおけるCODIを+3.0%向上させます。強力なスケーラビリティを示し、SIM-CoTはGPT-2において明示的CoTベースラインを2.1%上回り、2.3倍のトークン効率を達成し、LLaMA-3.1 8Bのような大規模モデルにおける性能ギャップを大幅に縮めます。

English

Implicit Chain-of-Thought (CoT) methods present a promising, token-efficient alternative to explicit CoT reasoning in Large Language Models (LLMs), but a persistent performance gap has limited the application of implicit CoT. We identify a core latent instability issue by scaling the computational budget of implicit CoT approaches: as we increase the number of implicit reasoning tokens to enhance performance, the training process often becomes unstable and collapses. Our analysis reveals that this instability arises from the latent representations becoming homogeneous and losing their semantic diversity, a failure caused by insufficient step-level supervision in existing implicit CoT approaches. To address this issue, we propose SIM-CoT, a plug-and-play training module that introduces step-level supervision to stabilize and enrich the latent reasoning space. Specifically, SIM-CoT employs an auxiliary decoder during training to align each implicit token with its corresponding explicit reasoning step, ensuring that latent states capture distinct and meaningful information. The proposed auxiliary decoder is removed during inference, preserving the computational efficiency of implicit CoT methods with no added overhead. In addition, the auxiliary decoder affords interpretability of implicit reasoning by projecting each latent token onto an explicit reasoning vocabulary, enabling per-step visualization of semantic roles and diagnosis. SIM-CoT significantly enhances both the in-domain accuracy and out-of-domain stability of various implicit CoT methods, boosting baselines like Coconut by +8.2% on GPT-2 and CODI by +3.0% on LLaMA-3.1 8B. Demonstrating strong scalability, SIM-CoT also surpasses the explicit CoT baseline on GPT-2 by 2.1% with 2.3\times greater token efficiency, while substantially closing the performance gap on larger models like LLaMA-3.1 8B.

SIM-CoT: 教師あり暗黙的連鎖思考

SIM-CoT: Supervised Implicit Chain-of-Thought

要旨

Support