彩票大語言模型假說：重新思考大語言模型壓縮應保留哪些能力？

摘要

出於降低大型語言模型（LLMs）計算與存儲成本的考量，模型壓縮及鍵值快取（KV Cache）壓縮技術已引起研究者的廣泛關注。然而，現有方法主要聚焦於確保壓縮後LLMs的性能不減，其衡量標準多為困惑度（perplexity）或在常識問答及基礎算術推理任務上的簡單準確率。本文簡要回顧了近期LLMs在檢索增強生成、多步推理、外部工具利用及計算表達力等方面的進展，這些技術顯著提升了LLM的性能。基於此，我們提出了一種“彩票LLM假說”，即對於特定LLM及任務，存在一個更小的“彩票LLM”，在藉助多步推理與外部工具的情況下，能夠達到與原LLM相當的性能。結合對LLM當前進展的梳理，我們探討並總結了現有方法中常被忽視的，彩票LLM與KV Cache壓縮所必需具備的核心能力。

English

Motivated by reducing the computational and storage costs of LLMs, model compression and KV cache compression have attracted much attention from researchers. However, current methods predominantly emphasize maintaining the performance of compressed LLMs, as measured by perplexity or simple accuracy on tasks of common sense knowledge QA and basic arithmetic reasoning. In this blog, we present a brief review of recent advancements in LLMs related to retrieval-augmented generation, multi-step reasoning, external tools, and computational expressivity, all of which substantially enhance LLM performance. Then, we propose a lottery LLM hypothesis suggesting that for a given LLM and task, there exists a smaller lottery LLM capable of producing the same performance as the original LLM with the assistance of multi-step reasoning and external tools. Based on the review of current progress in LLMs, we discuss and summarize the essential capabilities that the lottery LLM and KV cache compression must possess, which are currently overlooked in existing methods.

彩票大語言模型假說：重新思考大語言模型壓縮應保留哪些能力？

The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?

摘要

Support