技能图谱：面向海量智能体技能的依赖感知结构化检索

摘要

技能运用已成为现代智能体系统的核心组成部分，能显著提升智能体完成复杂任务的能力。在现实场景中，智能体需要监控并交互大量个人应用、网页浏览器及其他环境接口，技能库可扩展至数千个可复用技能。然而技能库规模扩大带来了两大关键挑战：首先，完整加载技能集会占满上下文窗口，导致标记成本增加、幻觉生成加剧及响应延迟。本文提出技能图谱（GoS）——面向大规模技能库的推理时结构检索层。GoS通过离线构建技能包的可执行技能图谱，在推理时通过混合语义-词法种子注入、反向加权个性化PageRank算法及上下文预算化动态加载技术，检索具有依赖感知的边界技能包。在SkillsBench和ALFWorld测试集上，GoS相较于原始全技能加载基线平均奖励提升43.6%，同时减少37.8%的输入标记，并在Claude Sonnet、GPT-5.2 Codex和MiniMax三大模型家族中均展现良好泛化能力。针对200至2000个技能的扩展消融实验进一步表明，GoS在权衡奖励值、标记效率与运行时间方面，持续优于原始技能加载与简单向量检索方法。

English

Skill usage has become a core component of modern agent systems and can substantially improve agents' ability to complete complex tasks. In real-world settings, where agents must monitor and interact with numerous personal applications, web browsers, and other environment interfaces, skill libraries can scale to thousands of reusable skills. Scaling to larger skill sets introduces two key challenges. First, loading the full skill set saturates the context window, driving up token costs, hallucination, and latency. In this paper, we present Graph of Skills (GoS), an inference-time structural retrieval layer for large skill libraries. GoS constructs an executable skill graph offline from skill packages, then at inference time retrieves a bounded, dependency-aware skill bundle through hybrid semantic-lexical seeding, reverse-weighted Personalized PageRank, and context-budgeted hydration. On SkillsBench and ALFWorld, GoS improves average reward by 43.6% over the vanilla full skill-loading baseline while reducing input tokens by 37.8%, and generalizes across three model families: Claude Sonnet, GPT-5.2 Codex, and MiniMax. Additional ablation studies across skill libraries ranging from 200 to 2,000 skills further demonstrate that GoS consistently outperforms both vanilla skills loading and simple vector retrieval in balancing reward, token efficiency, and runtime.

技能图谱：面向海量智能体技能的依赖感知结构化检索

Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills

摘要

Support