推理即压缩：通过条件信息瓶颈统一预算强制

摘要

思维链提示技术虽能提升大语言模型在复杂任务上的准确率，但常伴随令牌使用量与推理成本的增加。现有"预算强制"方法通过采用启发式长度惩罚的微调来降低成本，却同时抑制了关键推理与冗余填充内容。我们将高效推理重新定义为信息瓶颈原则下的有损压缩问题，并发现直接应用朴素IB到Transformer时存在关键理论缺陷：注意力机制违反了提示、推理轨迹与响应之间的马尔可夫性质。为解决此问题，我们在条件信息瓶颈框架下建立CoT生成模型，其中推理轨迹Z作为计算桥梁，仅包含无法直接从提示X获取的响应Y相关信息。由此推导出强化学习的通用目标函数：在给定推理轨迹先验分布的条件下，最大化任务奖励的同时压缩生成内容，该框架将常见启发式方法（如长度惩罚）作为特例（如均匀先验）纳入其中。与基于简单令牌计数的方案不同，我们引入语义先验，通过语言模型先验下的惊异值来度量令牌成本。实验表明，我们的CIB目标函数能有效剔除认知冗余的同时保持流畅性与逻辑性，在适度压缩下提升准确率，并在激进压缩时实现最小精度损失。

English

Chain-of-Thought (CoT) prompting improves LLM accuracy on complex tasks but often increases token usage and inference cost. Existing "Budget Forcing" methods reducing cost via fine-tuning with heuristic length penalties, suppress both essential reasoning and redundant filler. We recast efficient reasoning as a lossy compression problem under the Information Bottleneck (IB) principle, and identify a key theoretical gap when applying naive IB to transformers: attention violates the Markov property between prompt, reasoning trace, and response. To resolve this issue, we model CoT generation under the Conditional Information Bottleneck (CIB) principle, where the reasoning trace Z acts as a computational bridge that contains only the information about the response Y that is not directly accessible from the prompt X. This yields a general Reinforcement Learning objective: maximize task reward while compressing completions under a prior over reasoning traces, subsuming common heuristics (e.g., length penalties) as special cases (e.g., uniform priors). In contrast to naive token-counting-based approaches, we introduce a semantic prior that measures token cost by surprisal under a language model prior. Empirically, our CIB objective prunes cognitive bloat while preserving fluency and logic, improving accuracy at moderate compression and enabling aggressive compression with minimal accuracy drop.

推理即压缩：通过条件信息瓶颈统一预算强制

Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck

摘要

Support