密度を考慮したソフト文脈圧縮と半動的圧縮比

要旨

ソフト文脈圧縮は、長文脈をより少数の潜在トークンに符号化することで、LLMにおける長文脈処理の計算負荷を軽減する技術である。しかし、既存のフレームワークは均一な圧縮率を適用するため、自然言語の情報密度に存在する極端なばらつきを考慮できていない。情報密度を考慮した動的圧縮率の採用は直観的には思われるが、実証研究によれば、モデルは入力に依存する連続的な構造ハイパーパラメータでパラメータ化された操作を本質的に扱うことが困難である。この問題を解決するため、我々は半動的文脈圧縮フレームワークを提案する。本手法は、離散比率セレクタを特徴とし、これは内在的な情報密度に基づいて圧縮目標を予測し、それを事前定義された離散圧縮比率の集合に量子化する。このセレクタは、要約長を圧縮比率予測のラベル作成の代理として用い、合成データ上で圧縮器と効率的に共同訓練される。大規模な評価により、平均プーリングを基盤とする本密度認識フレームワークが、静的ベースラインを一貫して上回り、文脈圧縮技術における強固なパレートフロンティアを確立することが確認された。コード、データ、モデル重みはhttps://github.com/yuyijiong/semi-dynamic-context-compressで公開している。

English

Soft context compression reduces the computational workload of processing long contexts in LLMs by encoding long context into a smaller number of latent tokens. However, existing frameworks apply uniform compression ratios, failing to account for the extreme variance in natural language information density. While adopting a density-aware dynamic compression ratio seems intuitive, empirical investigations reveal that models struggle intrinsically with operations parameterized by input dependent, continuous structural hyperparameters. To resolve this pitfall, we introduce Semi-Dynamic Context Compression framework. Our approach features a Discrete Ratio Selector, which predicts a compression target based on intrinsic information density and quantizes it to a predefined set of discrete compression ratios. It is efficiently jointly trained with the compressor on synthetic data, with the summary lengths as a proxy to create labels for compression ratio prediction. Extensive evaluations confirm that our density-aware framework, utilizing mean pooling as the backbone, consistently outperforms static baselines, establishing a robust Pareto frontier for context compression techniques. Our code, data and model weights are available at https://github.com/yuyijiong/semi-dynamic-context-compress

密度を考慮したソフト文脈圧縮と半動的圧縮比

Density-aware Soft Context Compression with Semi-Dynamic Compression Ratio

要旨

Support