推理即压缩:通过条件信息瓶颈统一预算强制
Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck
March 9, 2026
作者: Fabio Valerio Massoli, Andrey Kuzmin, Arash Behboodi
cs.AI
摘要
思维链提示技术虽能提升大语言模型在复杂任务上的准确率,但常伴随令牌使用量与推理成本的增加。现有"预算强制"方法通过采用启发式长度惩罚的微调来降低成本,却同时抑制了关键推理与冗余填充内容。我们将高效推理重新定义为信息瓶颈原则下的有损压缩问题,并发现直接应用朴素IB到Transformer时存在关键理论缺陷:注意力机制违反了提示、推理轨迹与响应之间的马尔可夫性质。为解决此问题,我们在条件信息瓶颈框架下建立CoT生成模型,其中推理轨迹Z作为计算桥梁,仅包含无法直接从提示X获取的响应Y相关信息。由此推导出强化学习的通用目标函数:在给定推理轨迹先验分布的条件下,最大化任务奖励的同时压缩生成内容,该框架将常见启发式方法(如长度惩罚)作为特例(如均匀先验)纳入其中。与基于简单令牌计数的方案不同,我们引入语义先验,通过语言模型先验下的惊异值来度量令牌成本。实验表明,我们的CIB目标函数能有效剔除认知冗余的同时保持流畅性与逻辑性,在适度压缩下提升准确率,并在激进压缩时实现最小精度损失。
English
Chain-of-Thought (CoT) prompting improves LLM accuracy on complex tasks but often increases token usage and inference cost. Existing "Budget Forcing" methods reducing cost via fine-tuning with heuristic length penalties, suppress both essential reasoning and redundant filler. We recast efficient reasoning as a lossy compression problem under the Information Bottleneck (IB) principle, and identify a key theoretical gap when applying naive IB to transformers: attention violates the Markov property between prompt, reasoning trace, and response. To resolve this issue, we model CoT generation under the Conditional Information Bottleneck (CIB) principle, where the reasoning trace Z acts as a computational bridge that contains only the information about the response Y that is not directly accessible from the prompt X. This yields a general Reinforcement Learning objective: maximize task reward while compressing completions under a prior over reasoning traces, subsuming common heuristics (e.g., length penalties) as special cases (e.g., uniform priors). In contrast to naive token-counting-based approaches, we introduce a semantic prior that measures token cost by surprisal under a language model prior. Empirically, our CIB objective prunes cognitive bloat while preserving fluency and logic, improving accuracy at moderate compression and enabling aggressive compression with minimal accuracy drop.