QwenLong-CPRS：迈向具备动态上下文优化的无限大语言模型

摘要

本技术报告介绍了QwenLong-CPRS，一种专为显式长上下文优化设计的上下文压缩框架，旨在解决大语言模型（LLMs）在长序列处理过程中预填充阶段的高昂计算开销以及“中间迷失”性能下降问题。通过一种新颖的动态上下文优化机制实现，QwenLong-CPRS支持基于自然语言指令的多粒度上下文压缩，既提升了效率又改善了性能。作为Qwen架构系列的演进，QwenLong-CPRS引入了四项关键创新：（1）自然语言引导的动态优化，（2）增强边界感知的双向推理层，（3）配备语言建模头的令牌评判机制，以及（4）窗口并行推理。在涵盖4K至2M单词上下文的五项基准测试中，QwenLong-CPRS展现出三重有效性：（1）在准确性和效率上持续优于RAG和稀疏注意力等其他上下文管理方法；（2）与所有旗舰LLMs（包括GPT-4o、Gemini2.0-pro、Claude3.7-sonnet、DeepSeek-v3和Qwen2.5-max）架构无关的集成，实现了21.59倍的上下文压缩，同时带来19.15个百分点的平均性能提升；（3）结合Qwen2.5-32B-Instruct部署，QwenLong-CPRS在Ruler-128K和InfiniteBench上分别超越领先的专有LLMs 4.85和10.88个百分点，确立了新的SOTA性能。

English

This technical report presents QwenLong-CPRS, a context compression framework designed for explicit long-context optimization, addressing prohibitive computation overhead during the prefill stage and the "lost in the middle" performance degradation of large language models (LLMs) during long sequence processing. Implemented through a novel dynamic context optimization mechanism, QwenLong-CPRS enables multi-granularity context compression guided by natural language instructions, achieving both efficiency gains and improved performance. Evolved from the Qwen architecture series, QwenLong-CPRS introduces four key innovations: (1) Natural language-guided dynamic optimization, (2) Bidirectional reasoning layers for enhanced boundary awareness, (3) Token critic mechanisms with language modeling heads, and (4) Window-parallel inference. Comprehensive evaluations across five benchmarks (4K-2M word contexts) demonstrate QwenLong-CPRS's threefold effectiveness: (1) Consistent superiority over other context management methods like RAG and sparse attention in both accuracy and efficiency. (2) Architecture-agnostic integration with all flagship LLMs, including GPT-4o, Gemini2.0-pro, Claude3.7-sonnet, DeepSeek-v3, and Qwen2.5-max, achieves 21.59times context compression alongside 19.15-point average performance gains; (3) Deployed with Qwen2.5-32B-Instruct, QwenLong-CPRS surpasses leading proprietary LLMs by 4.85 and 10.88 points on Ruler-128K and InfiniteBench, establishing new SOTA performance.