密度感知的软上下文压缩与半动态压缩比

摘要

软上下文压缩技术通过将长上下文编码为较少的潜在标记，有效降低了大型语言模型处理长文本的计算负荷。然而现有框架普遍采用统一压缩比，未能充分考虑自然语言信息密度的极端差异性。虽然采用密度感知的动态压缩比看似直观，但实证研究表明模型在处理由输入相关的连续结构超参数所定义的操作时存在固有困难。针对这一缺陷，我们提出了半动态上下文压缩框架。该方案的核心是离散比率选择器，它能根据文本内在信息密度预测压缩目标，并将其量化到预设的离散压缩比集合中。该组件与压缩器在合成数据上实现了高效联合训练，其中摘要长度被作为压缩比预测的标签代理。大量实验证实，我们以均值池化为骨干的密度感知框架持续超越静态基线方案，为上下文压缩技术建立了稳健的帕累托前沿。相关代码、数据及模型权重已开源：https://github.com/yuyijiong/semi-dynamic-context-compress。

English

Soft context compression reduces the computational workload of processing long contexts in LLMs by encoding long context into a smaller number of latent tokens. However, existing frameworks apply uniform compression ratios, failing to account for the extreme variance in natural language information density. While adopting a density-aware dynamic compression ratio seems intuitive, empirical investigations reveal that models struggle intrinsically with operations parameterized by input dependent, continuous structural hyperparameters. To resolve this pitfall, we introduce Semi-Dynamic Context Compression framework. Our approach features a Discrete Ratio Selector, which predicts a compression target based on intrinsic information density and quantizes it to a predefined set of discrete compression ratios. It is efficiently jointly trained with the compressor on synthetic data, with the summary lengths as a proxy to create labels for compression ratio prediction. Extensive evaluations confirm that our density-aware framework, utilizing mean pooling as the backbone, consistently outperforms static baselines, establishing a robust Pareto frontier for context compression techniques. Our code, data and model weights are available at https://github.com/yuyijiong/semi-dynamic-context-compress

密度感知的软上下文压缩与半动态压缩比

Density-aware Soft Context Compression with Semi-Dynamic Compression Ratio

摘要

Support