面向大型语言模型的自适应多分辨率程序性知识压缩

摘要

大型语言模型（LLMs）被广泛用于通过自主工作流处理复杂任务。近年来，可复用的自然语言技能作为一种将程序性知识注入LLM应用的范式逐渐兴起。由于常用技能往往需要重复调用，在每次上下文输入完整文本会显著增加预填充成本和延迟。虽然文本压缩技术有望解决该问题，但现有方法主要面向文档中的事实性知识压缩而非程序性知识，难以胜任技能压缩任务。本文论证了有效的技能压缩方法应满足三个要求：1）保留工作流与工具协议间的逻辑依赖关系；2）支持对频繁更新的社区技能进行轻量级离线压缩；3）能够适应不同技能复杂度的差异。为此，我们提出SKIM（技能压缩框架），一种面向程序性技能的自适应多分辨率软令牌压缩框架。SKIM根据各技能复杂度生成不同数量的软令牌，在提升LLM推理效率的同时保持技能使用的有效性。实验表明，SKIM能将技能压缩至原始令牌长度的30%至60%，且在任务性能保持上优于现有压缩方法。相关代码已开源至https://github.com/bebr2/SKIM。

English

Large language models (LLMs) are widely used to tackle complex tasks with autonomous workflows. Recently, reusable natural language skills have emerged as a popular paradigm to inject procedural knowledge into LLM applications. Since popular skills are often invoked repeatedly, placing their full text in every context significantly increases prefill cost and latency. While text compression techniques have the potential to solve this problem, most existing methods are designed to compress factual knowledge in documents instead of procedural knowledge, making them insufficient for skill compression. In this paper, we argue that an effective skill compression method should: 1) preserve logical dependencies among workflows and tool protocols, 2) enable lightweight, offline compression for frequently updated community skills, and 3) be adaptable to varying complexities across skills. To address this, we present SKIM (SKIll coMpression), an adaptive multi-resolution soft token compression framework for procedural skills. Depending on the complexity of each skill, SKIM creates different numbers of soft tokens that not only improve the efficiency of LLM inference, but also preserve the effectiveness of skill usage. Experiments indicate that SKIM compresses skills to 30 to 60 percent of their original token length while preserving task performance better than existing compression methods.We have released our code at https://github.com/bebr2/SKIM .