ChatPaper.aiChatPaper

SkillOS:面向自我演化智能體的技能策展學習系統 SkillOS:為自我演化智能體設計的技能策展學習框架

SkillOS: Learning Skill Curation for Self-Evolving Agents

May 7, 2026
作者: Siru Ouyang, Jun Yan, Yanfei Chen, Rujun Han, Zifeng Wang, Bhavana Dalvi Mishra, Rui Meng, Chun-Liang Li, Yizhu Jiao, Kaiwen Zha, Maohao Shen, Vishy Tirumalashetty, George Lee, Jiawei Han, Tomas Pfister, Chen-Yu Lee
cs.AI

摘要

基於大型語言模型的智慧體正日益被部署來處理串流任務,但它們往往仍是單次性的問題解決者,未能從過往互動中學習。從經驗中提煉的可重複技能為自我演化提供了天然基礎,而高品質的技能策集成為關鍵瓶頸。現有方法要么依賴人工技能策集,要么採用啟發式技能操作規則,要么僅針對短視野的技能操作進行訓練。然而,這些方法仍難以從間接且延遲的回饋中學習複雜的長期策集策略。為解決這一挑戰,我們提出SkillOS——一種面向自我演化智慧體技能策集的經驗驅動強化學習訓練方案。SkillOS將檢索應用技能的凍結執行器與可訓練技能策集器配對,後者根據累積經驗更新外部技能庫。為提供策集學習信號,我們設計了複合獎勵機制,並基於技能相關的任務依賴關係對分組任務流進行訓練:前期軌跡更新技能庫,後續相關任務則評估這些更新。在多輪智慧體任務和單輪推理任務中,SkillOS在效能與效率上均持續超越無記憶架構及強記憶基線模型,且所學技能策集器能泛化至不同執行器架構與任務領域。進一步分析表明,學習後的策集器能產生更具針對性的技能使用方式,而技能庫中的技能會逐漸演化為結構更豐富的Markdown文件,編碼出更高層次的元技能。
English
LLM-based agents are increasingly deployed to handle streaming tasks, yet they often remain one-off problem solvers that fail to learn from past interactions. Reusable skills distilled from experience provide a natural substrate for self-evolution, where high-quality skill curation serves as the key bottleneck. Existing approaches either rely on manual skill curation, prescribe heuristic skill operations, or train for short-horizon skill operations. However, they still struggle to learn complex long-term curation policies from indirect and delayed feedback. To tackle this challenge, we propose SkillOS, an experience-driven RL training recipe for learning skill curation in self-evolving agents. SkillOS pairs a frozen agent executor that retrieves and applies skills with a trainable skill curator that updates an external SkillRepo from accumulated experience. To provide learning signals for curation, we design composite rewards and train on grouped task streams based on skill-relevant task dependencies, where earlier trajectories update the SkillRepo, and later related tasks evaluate these updates. Across multi-turn agentic tasks and single-turn reasoning tasks, SkillOS consistently outperforms memory-free and strong memory-based baselines in both effectiveness and efficiency, with the learned skill curator generalizing across different executor backbones and task domains. Further analyses show that the learned curator produces more targeted skill use, while the skills in SkillRepo evolve into more richly structured Markdown files that encode higher-level meta-skills over time.
PDF211May 9, 2026