ChatPaper.aiChatPaper

SkillOS:面向自进化智能体的技能编排学习系统

SkillOS: Learning Skill Curation for Self-Evolving Agents

May 7, 2026
作者: Siru Ouyang, Jun Yan, Yanfei Chen, Rujun Han, Zifeng Wang, Bhavana Dalvi Mishra, Rui Meng, Chun-Liang Li, Yizhu Jiao, Kaiwen Zha, Maohao Shen, Vishy Tirumalashetty, George Lee, Jiawei Han, Tomas Pfister, Chen-Yu Lee
cs.AI

摘要

基于大语言模型的智能体正被越来越多地部署于流式任务处理,但它们往往仍是单次性问题解决者,未能从历史交互中学习经验。从经验中提炼的可复用技能为自我进化提供了天然基础,而高质量技能库的构建则成为关键瓶颈。现有方法要么依赖人工技能筛选,要么采用启发式技能操作规则,或仅针对短周期技能操作进行训练。然而,这些方法仍难以从间接延迟的反馈中学习复杂的长期管理策略。为应对这一挑战,我们提出SkillOS——一种基于经验驱动的强化学习训练方案,用于实现自我进化智能体中的技能管理。SkillOS将负责检索应用技能的冻结执行器,与基于累积经验更新外部技能库的可训练管理模块相结合。为提供管理行为的学习信号,我们设计了复合奖励机制,并根据技能相关任务依赖关系对任务流进行分组训练:前期轨迹更新技能库,后期相关任务则用于评估这些更新。在多轮智能体任务和单轮推理任务中,SkillOS在效能与效率上均持续超越无记忆基线及强记忆基线,且所学技能管理策略能泛化至不同执行器架构与任务领域。进一步分析表明,学习后的管理模块能实现更精准的技能调用,而技能库中的技能会逐渐演变为结构更丰富的Markdown文件,编码出更高层次的元技能。
English
LLM-based agents are increasingly deployed to handle streaming tasks, yet they often remain one-off problem solvers that fail to learn from past interactions. Reusable skills distilled from experience provide a natural substrate for self-evolution, where high-quality skill curation serves as the key bottleneck. Existing approaches either rely on manual skill curation, prescribe heuristic skill operations, or train for short-horizon skill operations. However, they still struggle to learn complex long-term curation policies from indirect and delayed feedback. To tackle this challenge, we propose SkillOS, an experience-driven RL training recipe for learning skill curation in self-evolving agents. SkillOS pairs a frozen agent executor that retrieves and applies skills with a trainable skill curator that updates an external SkillRepo from accumulated experience. To provide learning signals for curation, we design composite rewards and train on grouped task streams based on skill-relevant task dependencies, where earlier trajectories update the SkillRepo, and later related tasks evaluate these updates. Across multi-turn agentic tasks and single-turn reasoning tasks, SkillOS consistently outperforms memory-free and strong memory-based baselines in both effectiveness and efficiency, with the learned skill curator generalizing across different executor backbones and task domains. Further analyses show that the learned curator produces more targeted skill use, while the skills in SkillRepo evolve into more richly structured Markdown files that encode higher-level meta-skills over time.
PDF211May 9, 2026