优化大型推理模型中的长度压缩
Optimizing Length Compression in Large Reasoning Models
June 17, 2025
作者: Zhengxiang Cheng, Dongping Chen, Mingyang Fu, Tianyi Zhou
cs.AI
摘要
大型推理模型(LRMs)已取得显著成功,但常存在生成冗长且不必要的推理链的问题。我们将此问题的核心归结为“无效思考”——模型在得出正确答案后,往往会反复核查其工作。针对这一特定低效现象,我们超越了一般性的效能与效率原则,提出了两个更为精细的新原则:简洁性(Brevity),主张消除冗余;充分性(Sufficiency),确保关键推理步骤得以保留。基于这些原则,我们引入了LC-R1,一种基于群体相对策略优化(GRPO)的训练后方法。LC-R1创新性地结合了整体简洁性的长度奖励与专门设计用于去除无效思考过程的压缩奖励。在多个推理基准上的广泛实验表明,LC-R1实现了序列长度的大幅缩减(约50%),而准确率仅轻微下降(约2%),在帕累托前沿上找到了一个优先高压缩的有利平衡点。我们的分析进一步验证了LC-R1的鲁棒性,并为开发更强大且计算效率更高的LRMs提供了宝贵见解。代码已发布于https://github.com/zxiangx/LC-R1。
English
Large Reasoning Models (LRMs) have achieved remarkable success, yet they
often suffer from producing unnecessary and verbose reasoning chains. We
identify a core aspect of this issue as "invalid thinking" -- models tend to
repeatedly double-check their work after having derived the correct answer. To
address this specific inefficiency, we move beyond the general principles of
Efficacy and Efficiency to propose two new, fine-grained principles: Brevity,
which advocates for eliminating redundancy, and Sufficiency, which ensures
critical reasoning steps are preserved. Guided by these principles, we
introduce LC-R1, a post-training method based on Group Relative Policy
Optimization (GRPO). LC-R1 employs a novel combination of a Length Reward for
overall conciseness and a Compress Reward that is specifically designed to
remove the invalid portion of the thinking process. Extensive experiments on
multiple reasoning benchmarks demonstrate that LC-R1 achieves a significant
reduction in sequence length (~50%) with only a marginal (~2%) drop in
accuracy, achieving a favorable trade-off point on the Pareto frontier that
prioritizes high compression. Our analysis further validates the robustness of
LC-R1 and provides valuable insights for developing more powerful yet
computationally efficient LRMs. Our code is released at
https://github.com/zxiangx/LC-R1.