대규모 언어 모델을 위한 적응형 다중 해상도 절차적 지식 압축

초록

대규모 언어 모델(LLM)은 자율적 워크플로를 통해 복잡한 작업을 해결하는 데 널리 사용된다. 최근에는 재사용 가능한 자연어 스킬이 LLM 애플리케이션에 절차적 지식을 주입하는 인기 있는 패러다임으로 부상했다. 널리 사용되는 스킬은 종종 반복적으로 호출되므로, 모든 맥락에 전체 텍스트를 배치하면 프리필 비용과 지연 시간이 크게 증가한다. 텍스트 압축 기술이 이 문제를 해결할 잠재력을 가지고 있지만, 대부분의 기존 방법은 절차적 지식 대신 문서의 사실적 지식을 압축하도록 설계되어 스킬 압축에는 충분하지 않다. 본 논문에서는 효과적인 스킬 압축 방법이 다음을 충족해야 한다고 주장한다: 1) 워크플로와 도구 프로토콜 간의 논리적 의존성을 보존하고, 2) 자주 업데이트되는 커뮤니티 스킬에 대해 경량의 오프라인 압축을 가능하게 하며, 3) 스킬 간 다양한 복잡성에 적응할 수 있어야 한다. 이를 해결하기 위해, 우리는 절차적 스킬을 위한 적응형 다중 해상도 소프트 토큰 압축 프레임워크인 SKIM(SKIll coMpression)을 제시한다. SKIM은 각 스킬의 복잡성에 따라 다양한 수의 소프트 토큰을 생성하여 LLM 추론의 효율성을 향상시킬 뿐만 아니라 스킬 사용의 효과성도 유지한다. 실험 결과, SKIM은 스킬을 원래 토큰 길이의 30~60%로 압축하면서도 기존 압축 방법보다 작업 성능을 더 잘 보존하는 것으로 나타났다. 코드는 https://github.com/bebr2/SKIM 에서 공개하였다.

English

Large language models (LLMs) are widely used to tackle complex tasks with autonomous workflows. Recently, reusable natural language skills have emerged as a popular paradigm to inject procedural knowledge into LLM applications. Since popular skills are often invoked repeatedly, placing their full text in every context significantly increases prefill cost and latency. While text compression techniques have the potential to solve this problem, most existing methods are designed to compress factual knowledge in documents instead of procedural knowledge, making them insufficient for skill compression. In this paper, we argue that an effective skill compression method should: 1) preserve logical dependencies among workflows and tool protocols, 2) enable lightweight, offline compression for frequently updated community skills, and 3) be adaptable to varying complexities across skills. To address this, we present SKIM (SKIll coMpression), an adaptive multi-resolution soft token compression framework for procedural skills. Depending on the complexity of each skill, SKIM creates different numbers of soft tokens that not only improve the efficiency of LLM inference, but also preserve the effectiveness of skill usage. Experiments indicate that SKIM compresses skills to 30 to 60 percent of their original token length while preserving task performance better than existing compression methods.We have released our code at https://github.com/bebr2/SKIM .