技能0：面向技能内化的情境智能体强化学习

摘要

智能体技能作为程序性知识与可执行资源的结构化封装包，在推理时被智能体动态加载，已成为增强大语言模型智能体的可靠机制。然而推理时的技能增强存在根本性局限：检索噪声会引入无关指导，注入的技能内容会产生大量令牌开销，且模型从未真正掌握其仅被动遵循的知识。我们提出一种新思路：能否将技能内化至模型参数中，实现无需运行时技能检索的零样本自主行为？为此我们推出SKILL0——一个专为技能内化设计的上下文强化学习框架。SKILL0采用以完整技能上下文为起点并逐步撤除的训练课程机制：离线按类别组织技能，将交互历史转化为紧凑的视觉上下文，指导模型学习工具调用与多轮任务完成。动态课程模块会评估每个技能文件在策略层面的有效性，仅保留当前策略在线性衰减预算内仍能受益的技能，直至智能体实现完全零样本运行。大量智能体实验表明，SKILL0相较标准强化学习基线取得显著提升（ALFWorld提升9.7%，Search-QA提升6.6%），同时保持每步少于0.5k令牌的高效上下文使用。代码已开源：https://github.com/ZJU-REAL/SkillZero。

English

Agent skills, structured packages of procedural knowledge and executable resources that agents dynamically load at inference time, have become a reliable mechanism for augmenting LLM agents. Yet inference-time skill augmentation is fundamentally limited: retrieval noise introduces irrelevant guidance, injected skill content imposes substantial token overhead, and the model never truly acquires the knowledge it merely follows. We ask whether skills can instead be internalized into model parameters, enabling zero-shot autonomous behavior without any runtime skill retrieval. We introduce SKILL0, an in-context reinforcement learning framework designed for skill internalization. SKILL0 introduces a training-time curriculum that begins with full skill context and progressively withdraws it. Skills are grouped offline by category and rendered with interaction history into a compact visual context, teaching he model tool invocation and multi-turn task completion. A Dynamic Curriculum then evaluates each skill file's on-policy helpfulness, retaining only those from which the current policy still benefits within a linearly decaying budget, until the agent operates in a fully zero-shot setting. Extensive agentic experiments demonstrate that SKILL0 achieves substantial improvements over the standard RL baseline (+9.7\% for ALFWorld and +6.6\% for Search-QA), while maintaining a highly efficient context of fewer than 0.5k tokens per step. Our code is available at https://github.com/ZJU-REAL/SkillZero.