推理即压缩:通过条件信息瓶颈统一预算强制
Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck
March 9, 2026
作者: Fabio Valerio Massoli, Andrey Kuzmin, Arash Behboodi
cs.AI
摘要
思维链提示虽能提升大模型在复杂任务上的准确率,但常伴随令牌使用量与推理成本的增加。现有“预算强制”方法通过启发式长度惩罚进行微调以降低成本,却同时抑制了关键推理与冗余填充内容。我们将高效推理重构为信息瓶颈原理下的有损压缩问题,并发现直接应用朴素IB到Transformer时存在关键理论缺陷:注意力机制违反了提示、推理轨迹与响应之间的马尔可夫属性。为解决此问题,我们在条件信息瓶颈框架下建立思维链生成模型,其中推理轨迹Z作为计算桥梁,仅保留无法从提示X直接获取的响应Y相关信息。由此推导出通用强化学习目标:在推理轨迹先验分布下压缩生成内容的同时最大化任务奖励,将常见启发式方法(如长度惩罚)归纳为特例(如均匀先验)。与基于简单令牌计数的方案不同,我们引入语义先验,通过语言模型先验下的惊异值衡量令牌成本。实验表明,我们的CIB目标能有效修剪认知冗余,同时保持流畅性与逻辑性,在适度压缩下提升准确率,并在激进压缩时实现最小精度损失。
English
Chain-of-Thought (CoT) prompting improves LLM accuracy on complex tasks but often increases token usage and inference cost. Existing "Budget Forcing" methods reducing cost via fine-tuning with heuristic length penalties, suppress both essential reasoning and redundant filler. We recast efficient reasoning as a lossy compression problem under the Information Bottleneck (IB) principle, and identify a key theoretical gap when applying naive IB to transformers: attention violates the Markov property between prompt, reasoning trace, and response. To resolve this issue, we model CoT generation under the Conditional Information Bottleneck (CIB) principle, where the reasoning trace Z acts as a computational bridge that contains only the information about the response Y that is not directly accessible from the prompt X. This yields a general Reinforcement Learning objective: maximize task reward while compressing completions under a prior over reasoning traces, subsuming common heuristics (e.g., length penalties) as special cases (e.g., uniform priors). In contrast to naive token-counting-based approaches, we introduce a semantic prior that measures token cost by surprisal under a language model prior. Empirically, our CIB objective prunes cognitive bloat while preserving fluency and logic, improving accuracy at moderate compression and enabling aggressive compression with minimal accuracy drop.