大型語言模型的自適應多分辨率程序性知識壓縮

摘要

大型語言模型（LLMs）廣泛應用於透過自主工作流程處理複雜任務。近期，可重複使用的自然語言技能已成為一種流行範式，用以將程序性知識注入LLM應用中。由於常見技能經常被重複調用，若在每個上下文環境中放置其完整文本，將顯著增加預填充成本與延遲。儘管文字壓縮技術有潛力解決此問題，但現有方法多專注於壓縮文件中的事實性知識，而非程序性知識，因此不足以勝任技能壓縮。本文主張，有效的技能壓縮方法應滿足以下條件：1）保留工作流程與工具協定之間的邏輯依賴關係；2）支援對頻繁更新的社群技能進行輕量級離線壓縮；3）能夠適應不同技能間的多樣複雜度。為此，我們提出SKIM（SKIll coMpression，技能壓縮），一種針對程序性技能的自適應多解析度軟標記壓縮框架。根據每項技能的複雜度，SKIM會生成不同數量的軟標記，不僅提升LLM推論效率，亦保留技能使用的有效性。實驗表明，SKIM能將技能壓縮至原始標記長度的30%至60%，同時在任務表現上優於現有壓縮方法。我們已將程式碼釋出於 https://github.com/bebr2/SKIM。

English

Large language models (LLMs) are widely used to tackle complex tasks with autonomous workflows. Recently, reusable natural language skills have emerged as a popular paradigm to inject procedural knowledge into LLM applications. Since popular skills are often invoked repeatedly, placing their full text in every context significantly increases prefill cost and latency. While text compression techniques have the potential to solve this problem, most existing methods are designed to compress factual knowledge in documents instead of procedural knowledge, making them insufficient for skill compression. In this paper, we argue that an effective skill compression method should: 1) preserve logical dependencies among workflows and tool protocols, 2) enable lightweight, offline compression for frequently updated community skills, and 3) be adaptable to varying complexities across skills. To address this, we present SKIM (SKIll coMpression), an adaptive multi-resolution soft token compression framework for procedural skills. Depending on the complexity of each skill, SKIM creates different numbers of soft tokens that not only improve the efficiency of LLM inference, but also preserve the effectiveness of skill usage. Experiments indicate that SKIM compresses skills to 30 to 60 percent of their original token length while preserving task performance better than existing compression methods.We have released our code at https://github.com/bebr2/SKIM .