對於長文本推理的提示壓縮方法進行特徵化
Characterizing Prompt Compression Methods for Long Context Inference
July 11, 2024
作者: Siddharth Jha, Lutfi Eren Erdogan, Sehoon Kim, Kurt Keutzer, Amir Gholami
cs.AI
摘要
在系統層面上,長篇文本推理面臨著增加的計算和記憶體需求挑戰,同時從準確性角度來看,能夠對長篇文本進行推理也是一大挑戰。最近,已提出了幾種方法來壓縮提示以減少上下文長度。然而,對於比較不同提出的方法在不同任務中的標準化分析工作卻很少。這導致了矛盾的結果。為了解決這個問題,我們在這裡對不同的提示壓縮方法進行了全面的特徵化和評估。特別是,我們分析了抽取式壓縮、基於摘要的抽象式壓縮和標記修剪方法。令人驚訝的是,我們發現抽取式壓縮通常優於所有其他方法,並且能夠實現最多10倍的壓縮,並且準確性下降最小。有趣的是,我們還發現,儘管最近有幾項聲稱,但標記修剪方法通常落後於抽取式壓縮。我們在摘要任務上只發現了輕微的改進。
English
Long context inference presents challenges at the system level with increased
compute and memory requirements, as well as from an accuracy perspective in
being able to reason over long contexts. Recently, several methods have been
proposed to compress the prompt to reduce the context length. However, there
has been little work on comparing the different proposed methods across
different tasks through a standardized analysis. This has led to conflicting
results. To address this, here we perform a comprehensive characterization and
evaluation of different prompt compression methods. In particular, we analyze
extractive compression, summarization-based abstractive compression, and token
pruning methods. Surprisingly, we find that extractive compression often
outperforms all the other approaches, and enables up to 10x compression with
minimal accuracy degradation. Interestingly, we also find that despite several
recent claims, token pruning methods often lag behind extractive compression.
We only found marginal improvements on summarization tasks.Summary
AI-Generated Summary