密度感知的软上下文压缩与半动态压缩比

摘要

软上下文压缩技术通过将长上下文编码为更少的潜在标记，降低了大型语言模型处理长文本的计算负荷。然而现有框架采用统一压缩比，未能考虑自然语言信息密度的极端差异性。虽然采用密度感知的动态压缩比看似直观，但实证研究表明模型本质上难以处理由输入相关的连续结构超参数参数化的操作。为解决此问题，我们提出了半动态上下文压缩框架。该方法的核心是离散比率选择器，它能基于内在信息密度预测压缩目标，并将其量化为预定义的离散压缩比集合。该选择器与压缩器在合成数据上高效协同训练，以摘要长度为代理指标生成压缩比预测标签。大量实验证实，我们以平均池化为骨干的密度感知框架持续优于静态基线，为上下文压缩技术建立了稳健的帕累托前沿。相关代码、数据及模型权重已开源：https://github.com/yuyijiong/semi-dynamic-context-compress

English

Soft context compression reduces the computational workload of processing long contexts in LLMs by encoding long context into a smaller number of latent tokens. However, existing frameworks apply uniform compression ratios, failing to account for the extreme variance in natural language information density. While adopting a density-aware dynamic compression ratio seems intuitive, empirical investigations reveal that models struggle intrinsically with operations parameterized by input dependent, continuous structural hyperparameters. To resolve this pitfall, we introduce Semi-Dynamic Context Compression framework. Our approach features a Discrete Ratio Selector, which predicts a compression target based on intrinsic information density and quantizes it to a predefined set of discrete compression ratios. It is efficiently jointly trained with the compressor on synthetic data, with the summary lengths as a proxy to create labels for compression ratio prediction. Extensive evaluations confirm that our density-aware framework, utilizing mean pooling as the backbone, consistently outperforms static baselines, establishing a robust Pareto frontier for context compression techniques. Our code, data and model weights are available at https://github.com/yuyijiong/semi-dynamic-context-compress