ChatPaper.aiChatPaper

基于显式信息传输的上下文压缩

Context Compression via Explicit Information Transmission

February 3, 2026
作者: Jiangnan Ye, Hanqi Yan, Zhenyi Shen, Heng Chang, Ye Mao, Yulan He
cs.AI

摘要

由于二次注意力机制和不断增长的键值缓存,大语言模型的长上下文推理成本高昂,这推动了上下文压缩技术的发展。本研究聚焦软上下文压缩方法,即将长上下文凝练为少量连续表征。现有方法通常将大语言模型本身作为可训练压缩器,依赖逐层自注意力迭代聚合信息。我们认为该范式存在两大结构缺陷:(i)跨层的渐进式表征覆盖;(ii)跨令牌的压缩容量分配失协。为此,我们提出ComprExIT(基于显式信息传输的上下文压缩)这一轻量级框架,将软压缩重构为新范式:基于冻结隐状态的显式信息传输。该方法使压缩过程与模型内部自注意力动态解耦。ComprExIT通过(i)深度维传输——将多层信息选择性地传输至令牌锚点以缓解渐进覆盖;(ii)宽度维传输——通过全局优化的传输方案将锚点聚合为少量槽位,确保信息分配的协同性。在六个问答基准测试中,ComprExIT以仅约1%的参数量增长持续优于现有上下文压缩方法,证明显式协同的信息传输能实现更高效稳健的长上下文压缩。
English
Long-context inference with Large Language Models (LLMs) is costly due to quadratic attention and growing key-value caches, motivating context compression. In this work, we study soft context compression, where a long context is condensed into a small set of continuous representations. Existing methods typically re-purpose the LLM itself as a trainable compressor, relying on layer-by-layer self-attention to iteratively aggregate information. We argue that this paradigm suffers from two structural limitations: (i) progressive representation overwriting across layers (ii) uncoordinated allocation of compression capacity across tokens. We propose ComprExIT (Context Compression via Explicit Information Transmission), a lightweight framework that formulates soft compression into a new paradigm: explicit information transmission over frozen LLM hidden states. This decouples compression from the model's internal self-attention dynamics. ComprExIT performs (i) depth-wise transmission to selectively transmit multi-layer information into token anchors, mitigating progressive overwriting, and (ii) width-wise transmission to aggregate anchors into a small number of slots via a globally optimized transmission plan, ensuring coordinated allocation of information. Across six question-answering benchmarks, ComprExIT consistently outperforms state-of-the-art context compression methods while introducing only ~1% additional parameters, demonstrating that explicit and coordinated information transmission enables more effective and robust long-context compression.
PDF142February 11, 2026