大型推理模型中的長度壓縮優化
Optimizing Length Compression in Large Reasoning Models
June 17, 2025
作者: Zhengxiang Cheng, Dongping Chen, Mingyang Fu, Tianyi Zhou
cs.AI
摘要
大型推理模型(LRMs)已取得顯著成功,然而它們常產生冗長且不必要的推理鏈。我們將此問題的核心歸因於「無效思考」——模型在得出正確答案後,往往會反覆檢查其工作。為解決這一特定效率問題,我們超越效能與效率的一般原則,提出了兩個新的細粒度原則:簡潔性(Brevity),主張消除冗餘;以及充分性(Sufficiency),確保關鍵推理步驟得以保留。基於這些原則,我們引入了LC-R1,這是一種基於群組相對策略優化(GRPO)的訓練後方法。LC-R1創新地結合了用於整體簡潔性的長度獎勵,以及專門設計用於移除思考過程中無效部分的壓縮獎勵。在多個推理基準上的廣泛實驗表明,LC-R1在序列長度上實現了顯著減少(約50%),而準確率僅略有下降(約2%),在帕累托前沿上達到了優先考慮高壓縮的有利平衡點。我們的分析進一步驗證了LC-R1的穩健性,並為開發更強大且計算效率更高的LRMs提供了寶貴見解。我們的代碼已發佈於https://github.com/zxiangx/LC-R1。
English
Large Reasoning Models (LRMs) have achieved remarkable success, yet they
often suffer from producing unnecessary and verbose reasoning chains. We
identify a core aspect of this issue as "invalid thinking" -- models tend to
repeatedly double-check their work after having derived the correct answer. To
address this specific inefficiency, we move beyond the general principles of
Efficacy and Efficiency to propose two new, fine-grained principles: Brevity,
which advocates for eliminating redundancy, and Sufficiency, which ensures
critical reasoning steps are preserved. Guided by these principles, we
introduce LC-R1, a post-training method based on Group Relative Policy
Optimization (GRPO). LC-R1 employs a novel combination of a Length Reward for
overall conciseness and a Compress Reward that is specifically designed to
remove the invalid portion of the thinking process. Extensive experiments on
multiple reasoning benchmarks demonstrate that LC-R1 achieves a significant
reduction in sequence length (~50%) with only a marginal (~2%) drop in
accuracy, achieving a favorable trade-off point on the Pareto frontier that
prioritizes high compression. Our analysis further validates the robustness of
LC-R1 and provides valuable insights for developing more powerful yet
computationally efficient LRMs. Our code is released at
https://github.com/zxiangx/LC-R1.