基于显式信息传输的上下文压缩
Context Compression via Explicit Information Transmission
February 3, 2026
作者: Jiangnan Ye, Hanqi Yan, Zhenyi Shen, Heng Chang, Ye Mao, Yulan He
cs.AI
摘要
由于二次注意力机制和不断增长的键值缓存,大型语言模型的长上下文推断成本高昂,这推动了上下文压缩技术的发展。本研究聚焦于软上下文压缩方法,即将长上下文凝练为少量连续表征。现有方法通常将LLM本身重新用作可训练的压缩器,依赖逐层自注意力机制进行迭代式信息聚合。我们认为这种范式存在两大结构局限:(一)跨层的渐进式表征覆盖;(二)跨令牌的压缩容量分配不协调。我们提出ComprExIT(通过显式信息传输的上下文压缩)这一轻量级框架,将软压缩重构为基于冻结LLM隐藏状态的显式信息传输新范式。该方法使压缩过程与模型内部自注意力动态解耦:通过(一)深度维传输将多层信息选择性地传递至令牌锚点以缓解渐进覆盖问题,(二)宽度维传输通过全局优化的传输方案将锚点聚合为少量槽位,确保信息分配的协调性。在六个问答基准测试中,ComprExIT在仅引入约1%额外参数的情况下,持续优于最先进的上下文压缩方法,证明显式协调的信息传输能实现更高效稳健的长上下文压缩。
English
Long-context inference with Large Language Models (LLMs) is costly due to quadratic attention and growing key-value caches, motivating context compression. In this work, we study soft context compression, where a long context is condensed into a small set of continuous representations. Existing methods typically re-purpose the LLM itself as a trainable compressor, relying on layer-by-layer self-attention to iteratively aggregate information. We argue that this paradigm suffers from two structural limitations: (i) progressive representation overwriting across layers (ii) uncoordinated allocation of compression capacity across tokens. We propose ComprExIT (Context Compression via Explicit Information Transmission), a lightweight framework that formulates soft compression into a new paradigm: explicit information transmission over frozen LLM hidden states. This decouples compression from the model's internal self-attention dynamics. ComprExIT performs (i) depth-wise transmission to selectively transmit multi-layer information into token anchors, mitigating progressive overwriting, and (ii) width-wise transmission to aggregate anchors into a small number of slots via a globally optimized transmission plan, ensuring coordinated allocation of information. Across six question-answering benchmarks, ComprExIT consistently outperforms state-of-the-art context compression methods while introducing only ~1% additional parameters, demonstrating that explicit and coordinated information transmission enables more effective and robust long-context compression.