技能图谱：面向海量智能体技能的结构化检索与依赖感知

摘要

技能运用已成为现代智能体系统的核心组成部分，能显著提升智能体完成复杂任务的能力。在现实场景中，智能体需要监控并交互大量个人应用、网页浏览器及其他环境接口，技能库可扩展至数千个可复用技能。然而技能库规模扩大带来两大挑战：首先，全量加载技能会占满上下文窗口，导致标记成本增加、幻觉率上升及响应延迟。本文提出技能图谱（GoS）——面向大型技能库的推理时结构检索层。GoS通过离线构建技能包的可执行图谱，在推理时采用混合语义-词法播种、反向加权个性化PageRank算法及上下文预算化注入技术，检索具有依赖感知的边界技能束。在SkillsBench和ALFWorld测试中，GoS相较原始全量技能加载基线平均奖励提升43.6%，同时减少37.8%的输入标记，并在Claude Sonnet、GPT-5.2 Codex和MiniMax三大模型家族中均展现泛化能力。针对200至2000个技能的扩展消融实验进一步表明，GoS在平衡奖励值、标记效率与运行时间方面持续优于原始技能加载与简单向量检索方法。

English

Skill usage has become a core component of modern agent systems and can substantially improve agents' ability to complete complex tasks. In real-world settings, where agents must monitor and interact with numerous personal applications, web browsers, and other environment interfaces, skill libraries can scale to thousands of reusable skills. Scaling to larger skill sets introduces two key challenges. First, loading the full skill set saturates the context window, driving up token costs, hallucination, and latency. In this paper, we present Graph of Skills (GoS), an inference-time structural retrieval layer for large skill libraries. GoS constructs an executable skill graph offline from skill packages, then at inference time retrieves a bounded, dependency-aware skill bundle through hybrid semantic-lexical seeding, reverse-weighted Personalized PageRank, and context-budgeted hydration. On SkillsBench and ALFWorld, GoS improves average reward by 43.6% over the vanilla full skill-loading baseline while reducing input tokens by 37.8%, and generalizes across three model families: Claude Sonnet, GPT-5.2 Codex, and MiniMax. Additional ablation studies across skill libraries ranging from 200 to 2,000 skills further demonstrate that GoS consistently outperforms both vanilla skills loading and simple vector retrieval in balancing reward, token efficiency, and runtime.