압축으로서의 추론: 조건부 정보 병목을 통한 예산 강제의 통합

초록

사고 연쇄(CoT) 프롬프팅은 복잡한 작업에서 LLM의 정확도를 향상시키지만, 종종 토큰 사용량과 추론 비용을 증가시킵니다. 기존의 '예산 강제(Budget Forcing)' 방법은 휴리스틱 길이 패널티를 적용한 미세 조정을 통해 비용을 절감하지만, 필수적인 추론 과정과 불필요한 내용을 동시에 억제합니다. 우리는 효율적 추론을 정보 병목(IB) 원칙 하의 손실 압축 문제로 재정의하고, 순진한 IB를 트랜스포머에 적용할 때 발생하는 주요 이론적 격차를 확인했습니다: 어텐션 메커니즘이 프롬프트, 추론 흔적, 응답 간의 마르코프 속성을 위반하는 문제입니다. 이를 해결하기 위해 조건부 정보 병목(CIB) 원칙 하에서 CoT 생성을 모델링하며, 여기서 추론 흔적 Z는 프롬프트 X로부터 직접 획득할 수 없는 응답 Y 관련 정보만을 포함하는 계산적 다리 역할을 합니다. 이로부터 보편적인 강화 학습 목표를 도출합니다: 추론 흔적에 대한 사전 분포 하에서 완성문을 압축하면서 작업 보상을 극대화하는 것으로, 일반적인 휴리스틱(예: 길이 패널티)을 균일 사전 분포 같은 특수 사례로 포괄합니다. 단순 토큰 계수 기반 접근법과 대조적으로, 우리는 언어 모델 사전 분포 하의 놀람도(surprisal)로 토큰 비용을 측정하는 의미론적 사전 분포를 도입합니다. 실험적으로 우리의 CIB 목표는 유창성과 논리를 유지하면서 인지적 잡음을 제거하여, 중간 수준 압축에서 정확도를 향상시키고 최소한의 정확도 하락으로 공격적 압축을 가능하게 합니다.

English

Chain-of-Thought (CoT) prompting improves LLM accuracy on complex tasks but often increases token usage and inference cost. Existing "Budget Forcing" methods reducing cost via fine-tuning with heuristic length penalties, suppress both essential reasoning and redundant filler. We recast efficient reasoning as a lossy compression problem under the Information Bottleneck (IB) principle, and identify a key theoretical gap when applying naive IB to transformers: attention violates the Markov property between prompt, reasoning trace, and response. To resolve this issue, we model CoT generation under the Conditional Information Bottleneck (CIB) principle, where the reasoning trace Z acts as a computational bridge that contains only the information about the response Y that is not directly accessible from the prompt X. This yields a general Reinforcement Learning objective: maximize task reward while compressing completions under a prior over reasoning traces, subsuming common heuristics (e.g., length penalties) as special cases (e.g., uniform priors). In contrast to naive token-counting-based approaches, we introduce a semantic prior that measures token cost by surprisal under a language model prior. Empirically, our CIB objective prunes cognitive bloat while preserving fluency and logic, improving accuracy at moderate compression and enabling aggressive compression with minimal accuracy drop.

압축으로서의 추론: 조건부 정보 병목을 통한 예산 강제의 통합

Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck

초록

Support