QwenLong-CPRS：邁向具備動態上下文優化的無限長語言模型

摘要

本技術報告介紹了QwenLong-CPRS，這是一個專為顯式長上下文優化設計的上下文壓縮框架，旨在解決大型語言模型（LLMs）在長序列處理過程中預填充階段的高昂計算開銷以及「迷失在中間」的性能下降問題。通過一種新穎的動態上下文優化機制實現，QwenLong-CPRS能夠在自然語言指令的引導下進行多粒度上下文壓縮，從而實現效率提升和性能改進。基於Qwen架構系列演進而來，QwenLong-CPRS引入了四大關鍵創新：（1）自然語言引導的動態優化，（2）增強邊界感知的雙向推理層，（3）帶有語言建模頭的令牌批判機制，以及（4）窗口並行推理。在五個基準測試（4K-2M詞上下文）上的全面評估展示了QwenLong-CPRS的三重有效性：（1）在準確性和效率上均優於其他上下文管理方法，如RAG和稀疏注意力；（2）與所有旗艦LLMs（包括GPT-4o、Gemini2.0-pro、Claude3.7-sonnet、DeepSeek-v3和Qwen2.5-max）的架構無縫集成，實現了21.59倍的上下文壓縮，並伴隨19.15點的平均性能提升；（3）與Qwen2.5-32B-Instruct部署時，QwenLong-CPRS在Ruler-128K和InfiniteBench上分別超越領先的專有LLMs達4.85和10.88點，創立了新的SOTA性能。

English

This technical report presents QwenLong-CPRS, a context compression framework designed for explicit long-context optimization, addressing prohibitive computation overhead during the prefill stage and the "lost in the middle" performance degradation of large language models (LLMs) during long sequence processing. Implemented through a novel dynamic context optimization mechanism, QwenLong-CPRS enables multi-granularity context compression guided by natural language instructions, achieving both efficiency gains and improved performance. Evolved from the Qwen architecture series, QwenLong-CPRS introduces four key innovations: (1) Natural language-guided dynamic optimization, (2) Bidirectional reasoning layers for enhanced boundary awareness, (3) Token critic mechanisms with language modeling heads, and (4) Window-parallel inference. Comprehensive evaluations across five benchmarks (4K-2M word contexts) demonstrate QwenLong-CPRS's threefold effectiveness: (1) Consistent superiority over other context management methods like RAG and sparse attention in both accuracy and efficiency. (2) Architecture-agnostic integration with all flagship LLMs, including GPT-4o, Gemini2.0-pro, Claude3.7-sonnet, DeepSeek-v3, and Qwen2.5-max, achieves 21.59times context compression alongside 19.15-point average performance gains; (3) Deployed with Qwen2.5-32B-Instruct, QwenLong-CPRS surpasses leading proprietary LLMs by 4.85 and 10.88 points on Ruler-128K and InfiniteBench, establishing new SOTA performance.