大規模言語モデルのための適応的マルチ解像度手続き的知識圧縮

要旨

大規模言語モデル（LLM）は、自律的なワークフローを用いて複雑なタスクに取り組むために広く利用されている。近年、再利用可能な自然言語スキルが、LLMアプリケーションに手続き的知識を注入するための一般的なパラダイムとして登場している。一般的なスキルは繰り返し呼び出されることが多いため、毎回のコンテキストにその全文を配置すると、プリフィルコストとレイテンシが大幅に増加する。テキスト圧縮技術はこの問題を解決する可能性を秘めているが、既存の手法のほとんどは文書内の事実的知識を圧縮するために設計されており、手続き的知識の圧縮には不十分である。本論文では、効果的なスキル圧縮手法には以下の要件が必要であると論じる：(1) ワークフローやツールプロトコル間の論理的依存関係を保持すること、(2) 頻繁に更新されるコミュニティスキルに対して軽量なオフライン圧縮を可能にすること、(3) スキルごとに異なる複雑さに適応可能であること。この課題に対処するため、我々はSKIM（SKIll coMpression）を提案する。これは手続き的スキルのための適応型マルチ解像度ソフトトークン圧縮フレームワークである。SKIMは各スキルの複雑さに応じて異なる数のソフトトークンを生成し、LLM推論の効率を向上させるだけでなく、スキル使用の有効性も維持する。実験により、SKIMはスキルを元のトークン長の30%から60%に圧縮しつつ、既存の圧縮手法よりも優れたタスク性能を維持することが示された。我々はコードを https://github.com/bebr2/SKIM で公開している。

English

Large language models (LLMs) are widely used to tackle complex tasks with autonomous workflows. Recently, reusable natural language skills have emerged as a popular paradigm to inject procedural knowledge into LLM applications. Since popular skills are often invoked repeatedly, placing their full text in every context significantly increases prefill cost and latency. While text compression techniques have the potential to solve this problem, most existing methods are designed to compress factual knowledge in documents instead of procedural knowledge, making them insufficient for skill compression. In this paper, we argue that an effective skill compression method should: 1) preserve logical dependencies among workflows and tool protocols, 2) enable lightweight, offline compression for frequently updated community skills, and 3) be adaptable to varying complexities across skills. To address this, we present SKIM (SKIll coMpression), an adaptive multi-resolution soft token compression framework for procedural skills. Depending on the complexity of each skill, SKIM creates different numbers of soft tokens that not only improve the efficiency of LLM inference, but also preserve the effectiveness of skill usage. Experiments indicate that SKIM compresses skills to 30 to 60 percent of their original token length while preserving task performance better than existing compression methods.We have released our code at https://github.com/bebr2/SKIM .