단계 엔트로피를 통한 LLM의 사고 연쇄 압축

초록

체인 오브 사고(CoT) 프롬프팅을 사용하는 대형 언어 모델(LLMs)은 복잡한 추론에서 뛰어난 성능을 보이지만, 상당한 중복성을 포함한 장황한 사고 과정을 생성하여 추론 비용을 증가시키고 효율성을 저하시킵니다. 우리는 이러한 중복성을 식별하기 위해 개별 추론 단계의 정보 기여도를 정량화하는 메트릭인 단계 엔트로피(step entropy)를 기반으로 한 새로운 CoT 압축 프레임워크를 소개합니다. 수학적 추론 벤치마크에 대한 이론적 분석과 광범위한 실증적 검증을 통해, 낮은 엔트로피를 가진 단계들이 실제로 매우 중복적임을 입증했습니다. 우리의 실험 결과, DeepSeek-R1-7B, 14B 및 Qwen3-8B 모델에서 낮은 엔트로피 중간 단계의 놀라운 80%를 최종 답변 정확도의 미미한 저하만으로 제거할 수 있음을 보여주었습니다. 이는 무작위 또는 높은 엔트로피 단계를 제거하는 것과는 극명한 대조를 이루며, 후자의 경우 추론 성능을 심각하게 저해합니다. 이를 바탕으로, 우리는 지도 미세 조정(SFT)과 그룹 상대 정책 최적화(GRPO) 강화 학습을 결합한 새로운 두 단계 훈련 전략을 제안합니다. 이 접근법은 [SKIP] 토큰을 전략적으로 통합함으로써 LLM이 추론 중에 압축된 CoT를 자율적으로 생성하도록 학습할 수 있게 합니다. 우리의 방법은 정확도를 엄격하게 유지하면서 LLM 추론 효율성을 크게 향상시켜, 실제 LLM 배포에 대한 중요한 시사점을 제공하고 추론 구조에 대한 더 깊은 이해를 가능하게 합니다.

English

Large Language Models (LLMs) using Chain-of-Thought (CoT) prompting excel at complex reasoning but generate verbose thought processes with considerable redundancy, leading to increased inference costs and reduced efficiency. We introduce a novel CoT compression framework based on step entropy, a metric that quantifies the informational contribution of individual reasoning steps to identify redundancy. Through theoretical analysis and extensive empirical validation on mathematical reasoning benchmarks, we demonstrate that steps with low entropy are indeed highly redundant. Our experiments reveal that an astonishing 80\% of low-entropy intermediate steps can be pruned with minor degradation in the final answer accuracy across DeepSeek-R1-7B, 14B and Qwen3-8B. This finding sharply contrasts with random or high-entropy pruning, which severely impairs reasoning performance. Building on this, we propose a novel two-stage training strategy combining Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) reinforcement learning. This approach enables LLMs to autonomously learn to generate compressed COTs during inference by strategically incorporating [SKIP] tokens. Our method significantly enhances LLM inference efficiency while rigorously preserving accuracy, offering profound implications for practical LLM deployment and a deeper understanding of reasoning structures.

단계 엔트로피를 통한 LLM의 사고 연쇄 압축

Compressing Chain-of-Thought in LLMs via Step Entropy

초록

Support