LatentSkill：從上下文文本技能到LLM智能體的權重內潛在技能

摘要

代理系統日益使用文本技能來編碼可重複使用的任務流程，但將這些技能在每一步都注入提示中會帶來大量的上下文開銷，並使技能內容以純文字形式暴露。我們提出 LatentSkill，一個透過預訓練的超網路將文本技能轉換為即插即用 LoRA 適配器的框架。LatentSkill 將技能知識儲存在權重空間而非上下文空間中，移除了每步的技能令牌，同時保留了模組化載入、縮放與組合的能力。在 ALFWorld 與 Search-QA 上，LatentSkill 優於對應的上下文內技能基準線，同時使用的預填充令牌大幅減少：它在 ALFWorld 的已見與未見分割上分別提升了 21.4 與 13.4 個百分點的成功率，並減少了 64.1% 的預填充令牌；在 Search-QA 上則以 72.2% 的技能令牌開銷降低，提升了 3.0 個百分點的完全匹配率。進一步分析顯示，生成的技能 LoRA 形成了結構化的語義幾何形狀，可透過 LoRA 縮放係數精確控制，並能在技能組件對齊時透過參數空間算術進行組合。這些發現表明，權重空間技能為擴充大型語言模型代理提供了一種高效、模組化且暴露程度較低的基礎。

English

Agent systems increasingly use textual skills to encode reusable task procedures, but injecting these skills into the prompt at every step incurs substantial context overhead and exposes skill content as plaintext. We present LatentSkill, a framework that converts textual skills into plug-and-play LoRA adapters through a pretrained hypernetwork. LatentSkill stores skill knowledge in weight space rather than context space, removing per-step skill tokens while preserving modular loading, scaling, and composition. On ALFWorld and Search-QA, LatentSkill outperforms the corresponding in-context skill baseline while using substantially fewer prefill tokens: it improves ALFWorld success by 21.4 and 13.4 points on the seen and unseen splits with 64.1% fewer prefill tokens, and improves Search-QA exact match by 3.0 points with 72.2% lower skill-token overhead. Further analysis shows that generated skill LoRAs form a structured semantic geometry, can be precisely controlled via the LoRA scaling coefficient, and can be composed through parameter-space arithmetic when skill components are aligned. These findings suggest that weight-space skills provide an efficient, modular, and less exposed substrate for extending LLM agents.