推理即压缩：通过条件信息瓶颈统一预算强制

摘要

思维链提示虽能提升大模型在复杂任务上的准确率，但常伴随令牌使用量与推理成本的增加。现有“预算强制”方法通过启发式长度惩罚进行微调以降低成本，却同时抑制了关键推理与冗余填充内容。我们将高效推理重构为信息瓶颈原理下的有损压缩问题，并发现直接应用朴素IB到Transformer时存在关键理论缺陷：注意力机制违反了提示、推理轨迹与响应之间的马尔可夫属性。为解决此问题，我们在条件信息瓶颈框架下建立思维链生成模型，其中推理轨迹Z作为计算桥梁，仅保留无法从提示X直接获取的响应Y相关信息。由此推导出通用强化学习目标：在推理轨迹先验分布下压缩生成内容的同时最大化任务奖励，将常见启发式方法（如长度惩罚）归纳为特例（如均匀先验）。与基于简单令牌计数的方案不同，我们引入语义先验，通过语言模型先验下的惊异值衡量令牌成本。实验表明，我们的CIB目标能有效修剪认知冗余，同时保持流畅性与逻辑性，在适度压缩下提升准确率，并在激进压缩时实现最小精度损失。

English

Chain-of-Thought (CoT) prompting improves LLM accuracy on complex tasks but often increases token usage and inference cost. Existing "Budget Forcing" methods reducing cost via fine-tuning with heuristic length penalties, suppress both essential reasoning and redundant filler. We recast efficient reasoning as a lossy compression problem under the Information Bottleneck (IB) principle, and identify a key theoretical gap when applying naive IB to transformers: attention violates the Markov property between prompt, reasoning trace, and response. To resolve this issue, we model CoT generation under the Conditional Information Bottleneck (CIB) principle, where the reasoning trace Z acts as a computational bridge that contains only the information about the response Y that is not directly accessible from the prompt X. This yields a general Reinforcement Learning objective: maximize task reward while compressing completions under a prior over reasoning traces, subsuming common heuristics (e.g., length penalties) as special cases (e.g., uniform priors). In contrast to naive token-counting-based approaches, we introduce a semantic prior that measures token cost by surprisal under a language model prior. Empirically, our CIB objective prunes cognitive bloat while preserving fluency and logic, improving accuracy at moderate compression and enabling aggressive compression with minimal accuracy drop.