LLM在KV快取壓縮下能否保持基本能力?
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?
February 4, 2025
作者: Xiang Liu, Zhenheng Tang, Hong Chen, Peijie Dong, Zeyu Li, Xiuze Zhou, Bo Li, Xuming Hu, Xiaowen Chu
cs.AI
摘要
本文探討了大型語言模型(LLMs)中一個未被充分探討的挑戰:KV快取壓縮方法對LLMs基本能力的影響。現有方法在長文本基準上實現了令人印象深刻的壓縮比,但它們對核心模型能力的影響仍未受到充分研究。我們提出了一項全面的實證研究,評估了不同任務中突出的KV快取壓縮方法,涵蓋世界知識、常識推理、算術推理、代碼生成、安全性以及長文本理解和生成。我們的分析顯示,KV快取壓縮方法展現出任務特定的性能降級。算術推理任務對積極壓縮特別敏感,不同方法的性能下降範圍為17.4%至43.3%。值得注意的是,DeepSeek R1 Distill模型相較於指令調整模型表現出更強的壓縮容忍度,僅表現出9.67%至25.53%的性能降級。基於我們對注意力模式和跨任務壓縮性能的分析,我們提出了ShotKV,一種新穎的壓縮方法,明顯處理預填充和解碼階段,同時保持了基於shot的語義連貫性。實證結果顯示,ShotKV在積極壓縮比下,長文本生成任務實現了9%至18%的性能改善。
English
This paper investigates an under-explored challenge in large language models
(LLMs): the impact of KV cache compression methods on LLMs' fundamental
capabilities. While existing methods achieve impressive compression ratios on
long-context benchmarks, their effects on core model capabilities remain
understudied. We present a comprehensive empirical study evaluating prominent
KV cache compression methods across diverse tasks, spanning world knowledge,
commonsense reasoning, arithmetic reasoning, code generation, safety, and
long-context understanding and generation.Our analysis reveals that KV cache
compression methods exhibit task-specific performance degradation. Arithmetic
reasoning tasks prove particularly sensitive to aggressive compression, with
different methods showing performance drops of 17.4%-43.3%. Notably, the
DeepSeek R1 Distill model exhibits more robust compression tolerance compared
to instruction-tuned models, showing only 9.67%-25.53% performance
degradation. Based on our analysis of attention patterns and cross-task
compression performance, we propose ShotKV, a novel compression approach that
distinctly handles prefill and decoding phases while maintaining shot-level
semantic coherence. Empirical results show that ShotKV achieves 9%-18%
performance improvements on long-context generation tasks under aggressive
compression ratios.Summary
AI-Generated Summary