SIM-CoT: 지도 학습 기반 암묵적 사고 연쇄

초록

암묵적 사고 연쇄(Implicit Chain-of-Thought, CoT) 방법은 대규모 언어 모델(LLMs)에서 명시적 CoT 추론에 비해 토큰 효율성이 뛰어난 대안으로 주목받고 있지만, 지속적인 성능 격차로 인해 암묵적 CoT의 적용이 제한되어 왔습니다. 우리는 암묵적 CoT 접근법의 계산 예산을 확장함으로써 핵심적인 잠재적 불안정성 문제를 확인했습니다: 성능을 향상시키기 위해 암묵적 추론 토큰의 수를 증가시킬수록, 학습 과정이 종종 불안정해지고 붕괴되는 현상이 발생합니다. 우리의 분석에 따르면, 이러한 불안정성은 잠재 표현이 동질화되고 의미적 다양성을 잃어버리기 때문에 발생하며, 이는 기존 암묵적 CoT 접근법에서 단계별 감독이 충분하지 않아 생기는 실패로 인한 것입니다. 이 문제를 해결하기 위해, 우리는 SIM-CoT라는 플러그 앤 플레이(plug-and-play) 학습 모듈을 제안합니다. SIM-CoT는 단계별 감독을 도입하여 잠재 추론 공간을 안정화하고 풍부하게 만듭니다. 구체적으로, SIM-CoT는 학습 중에 보조 디코더를 사용하여 각 암묵적 토큰을 해당하는 명시적 추론 단계와 정렬함으로써, 잠재 상태가 독특하고 의미 있는 정보를 포착하도록 보장합니다. 제안된 보조 디코더는 추론 중에 제거되어, 암묵적 CoT 방법의 계산 효율성을 유지하며 추가 오버헤드 없이 동작합니다. 또한, 보조 디코더는 각 잠재 토큰을 명시적 추론 어휘에 투영함으로써 암묵적 추론의 해석 가능성을 제공하며, 단계별 시맨틱 역할 시각화와 진단을 가능하게 합니다. SIM-CoT는 다양한 암묵적 CoT 방법의 도메인 내 정확도와 도메인 외 안정성을 크게 향상시켜, GPT-2에서 Coconut과 같은 베이스라인을 +8.2%, LLaMA-3.1 8B에서 CODI를 +3.0%만큼 향상시켰습니다. 강력한 확장성을 보여주는 SIM-CoT는 또한 GPT-2에서 명시적 CoT 베이스라인을 2.1% 앞서며 2.3배 더 높은 토큰 효율성을 달성했고, LLaMA-3.1 8B와 같은 더 큰 모델에서도 성능 격차를 크게 좁혔습니다.

English

Implicit Chain-of-Thought (CoT) methods present a promising, token-efficient alternative to explicit CoT reasoning in Large Language Models (LLMs), but a persistent performance gap has limited the application of implicit CoT. We identify a core latent instability issue by scaling the computational budget of implicit CoT approaches: as we increase the number of implicit reasoning tokens to enhance performance, the training process often becomes unstable and collapses. Our analysis reveals that this instability arises from the latent representations becoming homogeneous and losing their semantic diversity, a failure caused by insufficient step-level supervision in existing implicit CoT approaches. To address this issue, we propose SIM-CoT, a plug-and-play training module that introduces step-level supervision to stabilize and enrich the latent reasoning space. Specifically, SIM-CoT employs an auxiliary decoder during training to align each implicit token with its corresponding explicit reasoning step, ensuring that latent states capture distinct and meaningful information. The proposed auxiliary decoder is removed during inference, preserving the computational efficiency of implicit CoT methods with no added overhead. In addition, the auxiliary decoder affords interpretability of implicit reasoning by projecting each latent token onto an explicit reasoning vocabulary, enabling per-step visualization of semantic roles and diagnosis. SIM-CoT significantly enhances both the in-domain accuracy and out-of-domain stability of various implicit CoT methods, boosting baselines like Coconut by +8.2% on GPT-2 and CODI by +3.0% on LLaMA-3.1 8B. Demonstrating strong scalability, SIM-CoT also surpasses the explicit CoT baseline on GPT-2 by 2.1% with 2.3\times greater token efficiency, while substantially closing the performance gap on larger models like LLaMA-3.1 8B.

SIM-CoT: 지도 학습 기반 암묵적 사고 연쇄

SIM-CoT: Supervised Implicit Chain-of-Thought

초록

Support