ChatPaper.aiChatPaper

大型推理模型中的長度壓縮優化

Optimizing Length Compression in Large Reasoning Models

June 17, 2025
作者: Zhengxiang Cheng, Dongping Chen, Mingyang Fu, Tianyi Zhou
cs.AI

摘要

大型推理模型(LRMs)已取得顯著成功,然而它們常產生冗長且不必要的推理鏈。我們將此問題的核心歸因於「無效思考」——模型在得出正確答案後,往往會反覆檢查其工作。為解決這一特定效率問題,我們超越效能與效率的一般原則,提出了兩個新的細粒度原則:簡潔性(Brevity),主張消除冗餘;以及充分性(Sufficiency),確保關鍵推理步驟得以保留。基於這些原則,我們引入了LC-R1,這是一種基於群組相對策略優化(GRPO)的訓練後方法。LC-R1創新地結合了用於整體簡潔性的長度獎勵,以及專門設計用於移除思考過程中無效部分的壓縮獎勵。在多個推理基準上的廣泛實驗表明,LC-R1在序列長度上實現了顯著減少(約50%),而準確率僅略有下降(約2%),在帕累托前沿上達到了優先考慮高壓縮的有利平衡點。我們的分析進一步驗證了LC-R1的穩健性,並為開發更強大且計算效率更高的LRMs提供了寶貴見解。我們的代碼已發佈於https://github.com/zxiangx/LC-R1。
English
Large Reasoning Models (LRMs) have achieved remarkable success, yet they often suffer from producing unnecessary and verbose reasoning chains. We identify a core aspect of this issue as "invalid thinking" -- models tend to repeatedly double-check their work after having derived the correct answer. To address this specific inefficiency, we move beyond the general principles of Efficacy and Efficiency to propose two new, fine-grained principles: Brevity, which advocates for eliminating redundancy, and Sufficiency, which ensures critical reasoning steps are preserved. Guided by these principles, we introduce LC-R1, a post-training method based on Group Relative Policy Optimization (GRPO). LC-R1 employs a novel combination of a Length Reward for overall conciseness and a Compress Reward that is specifically designed to remove the invalid portion of the thinking process. Extensive experiments on multiple reasoning benchmarks demonstrate that LC-R1 achieves a significant reduction in sequence length (~50%) with only a marginal (~2%) drop in accuracy, achieving a favorable trade-off point on the Pareto frontier that prioritizes high compression. Our analysis further validates the robustness of LC-R1 and provides valuable insights for developing more powerful yet computationally efficient LRMs. Our code is released at https://github.com/zxiangx/LC-R1.
PDF62June 18, 2025