LatentSkill: LLMエージェントのためのコンテキスト内テキストスキルから重み内潜在スキルへ

要旨

エージェントシステムは、再利用可能なタスク手順をコード化するためにテキスト形式のスキルを利用することが増えているが、これらのスキルを毎ステップでプロンプトに注入すると、大きなコンテキストオーバーヘッドが発生し、スキル内容がプレーンテキストとして露出してしまう。本稿では、事前学習されたハイパーネットワークを介してテキスト形式のスキルをプラグアンドプレイのLoRAアダプターに変換するフレームワーク、LatentSkillを提案する。LatentSkillはスキル知識をコンテキスト空間ではなく重み空間に保存することで、毎ステップのスキルトークンを排除しつつ、モジュール式の読み込み、スケーリング、構成を維持する。ALFWorldおよびSearch-QAにおいて、LatentSkillは対応するコンテキスト内スキルベースラインを上回る性能を示し、かつ使用するプレフィルトークン数を大幅に削減する。具体的には、ALFWorldでは既知/未知の分割においてそれぞれ21.4ポイントおよび13.4ポイントの成功率向上を達成し、プレフィルトークン数を64.1%削減する。Search-QAでは完全一致スコアを3.0ポイント向上させ、スキルトークンオーバーヘッドを72.2%削減する。さらに分析により、生成されたスキルLoRAは構造化された意味幾何構造を形成し、LoRAスケーリング係数によって精密に制御可能であり、スキル構成要素が揃った場合にはパラメータ空間での演算を通じて合成できることが示された。これらの知見は、重み空間スキルがLLMエージェントを拡張するための効率的でモジュール化され、露出の少ない基盤を提供することを示唆している。

English

Agent systems increasingly use textual skills to encode reusable task procedures, but injecting these skills into the prompt at every step incurs substantial context overhead and exposes skill content as plaintext. We present LatentSkill, a framework that converts textual skills into plug-and-play LoRA adapters through a pretrained hypernetwork. LatentSkill stores skill knowledge in weight space rather than context space, removing per-step skill tokens while preserving modular loading, scaling, and composition. On ALFWorld and Search-QA, LatentSkill outperforms the corresponding in-context skill baseline while using substantially fewer prefill tokens: it improves ALFWorld success by 21.4 and 13.4 points on the seen and unseen splits with 64.1% fewer prefill tokens, and improves Search-QA exact match by 3.0 points with 72.2% lower skill-token overhead. Further analysis shows that generated skill LoRAs form a structured semantic geometry, can be precisely controlled via the LoRA scaling coefficient, and can be composed through parameter-space arithmetic when skill components are aligned. These findings suggest that weight-space skills provide an efficient, modular, and less exposed substrate for extending LLM agents.