QwenLong-CPRS: Naar oneindige LLM's met dynamische contextoptimalisatie

Samenvatting

Dit technische rapport presenteert QwenLong-CPRS, een contextcompressieframework ontworpen voor expliciete optimalisatie van lange contexten, waarbij het buitensporige rekenoverhead tijdens de prefase en de "lost in the middle"-prestatievermindering van grote taalmodellen (LLM's) tijdens de verwerking van lange sequenties worden aangepakt. Geïmplementeerd via een nieuw dynamisch contextoptimalisatiemechanisme, maakt QwenLong-CPRS multi-granulariteit contextcompressie mogelijk, geleid door natuurlijke taal instructies, wat zowel efficiëntiewinst als verbeterde prestaties oplevert. Ontwikkeld vanuit de Qwen-architectuurserie, introduceert QwenLong-CPRS vier belangrijke innovaties: (1) Dynamische optimalisatie geleid door natuurlijke taal, (2) Bidirectionele redeneerlagen voor verbeterd grensvlakbewustzijn, (3) Token-criticmechanismen met taalmodelleringskoppen, en (4) Venster-parallelle inferentie. Uitgebreide evaluaties over vijf benchmarks (4K-2M woordcontexten) tonen de drievoudige effectiviteit van QwenLong-CPRS aan: (1) Consistente superioriteit ten opzichte van andere contextbeheermethoden zoals RAG en sparse attention, zowel in nauwkeurigheid als efficiëntie. (2) Architectuuronafhankelijke integratie met alle toonaangevende LLM's, waaronder GPT-4o, Gemini2.0-pro, Claude3.7-sonnet, DeepSeek-v3 en Qwen2.5-max, bereikt een contextcompressie van 21,59 keer naast een gemiddelde prestatieverbetering van 19,15 punten; (3) Geïmplementeerd met Qwen2.5-32B-Instruct, overtreft QwenLong-CPRS toonaangevende propriëtaire LLM's met 4,85 en 10,88 punten op Ruler-128K en InfiniteBench, waarmee nieuwe state-of-the-art (SOTA) prestaties worden gevestigd.

English

This technical report presents QwenLong-CPRS, a context compression framework designed for explicit long-context optimization, addressing prohibitive computation overhead during the prefill stage and the "lost in the middle" performance degradation of large language models (LLMs) during long sequence processing. Implemented through a novel dynamic context optimization mechanism, QwenLong-CPRS enables multi-granularity context compression guided by natural language instructions, achieving both efficiency gains and improved performance. Evolved from the Qwen architecture series, QwenLong-CPRS introduces four key innovations: (1) Natural language-guided dynamic optimization, (2) Bidirectional reasoning layers for enhanced boundary awareness, (3) Token critic mechanisms with language modeling heads, and (4) Window-parallel inference. Comprehensive evaluations across five benchmarks (4K-2M word contexts) demonstrate QwenLong-CPRS's threefold effectiveness: (1) Consistent superiority over other context management methods like RAG and sparse attention in both accuracy and efficiency. (2) Architecture-agnostic integration with all flagship LLMs, including GPT-4o, Gemini2.0-pro, Claude3.7-sonnet, DeepSeek-v3, and Qwen2.5-max, achieves 21.59times context compression alongside 19.15-point average performance gains; (3) Deployed with Qwen2.5-32B-Instruct, QwenLong-CPRS surpasses leading proprietary LLMs by 4.85 and 10.88 points on Ruler-128K and InfiniteBench, establishing new SOTA performance.

QwenLong-CPRS: Naar oneindige LLM's met dynamische contextoptimalisatie

QwenLong-CPRS: Towards infty-LLMs with Dynamic Context Optimization

Samenvatting

Support