ロトリーLLM仮説：LLM圧縮が保持すべき能力の再考

要旨

大規模言語モデル（LLM）の計算コストとストレージコストの削減を目的として、モデル圧縮とKVキャッシュ圧縮が研究者の注目を集めています。しかし、現在の手法は主に、圧縮されたLLMの性能を維持することに重点を置いており、その性能は、常識知識QAや基本的な算術推論タスクにおけるパープレキシティや単純な精度によって測定されています。このブログでは、検索拡張生成、多段階推論、外部ツール、計算表現力に関連するLLMの最近の進歩について簡単にレビューし、これらがLLMの性能を大幅に向上させることを示します。その後、特定のLLMとタスクに対して、多段階推論と外部ツールの助けを借りて、元のLLMと同じ性能を発揮できるより小さな「宝くじLLM」が存在するという仮説を提案します。LLMの現在の進歩をレビューした上で、既存の手法では見落とされている、宝くじLLMとKVキャッシュ圧縮が備えるべき重要な能力について議論し、まとめます。

English

Motivated by reducing the computational and storage costs of LLMs, model compression and KV cache compression have attracted much attention from researchers. However, current methods predominantly emphasize maintaining the performance of compressed LLMs, as measured by perplexity or simple accuracy on tasks of common sense knowledge QA and basic arithmetic reasoning. In this blog, we present a brief review of recent advancements in LLMs related to retrieval-augmented generation, multi-step reasoning, external tools, and computational expressivity, all of which substantially enhance LLM performance. Then, we propose a lottery LLM hypothesis suggesting that for a given LLM and task, there exists a smaller lottery LLM capable of producing the same performance as the original LLM with the assistance of multi-step reasoning and external tools. Based on the review of current progress in LLMs, we discuss and summarize the essential capabilities that the lottery LLM and KV cache compression must possess, which are currently overlooked in existing methods.

ロトリーLLM仮説：LLM圧縮が保持すべき能力の再考

The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?

要旨

Support