ChatPaper.aiChatPaper

OpenSkill:大语言模型智能体的开放世界自我进化

OpenSkill: Open-World Self-Evolution for LLM Agents

June 4, 2026
作者: Zhiling Yan, Dingjie Song, Hanrong Zhang, Wei Liang, Yuxuan Zhang, Yutong Dai, Lifang He, Philip S. Yu, Ran Xu, Xiang Li, Lichao Sun
cs.AI

摘要

自我进化智能体需要在部署后适应环境,但现有方法假设存在可用的学习循环,例如精心整理的技能、成功轨迹或验证信号。而在真实的开放世界部署中,这些可能均不提供,仅给出一个任务提示。本文研究开放世界自我进化问题,其中智能体必须从头构建自己的技能和验证信号,利用开放世界资源但无需目标任务监督。我们提出OpenSkill框架,该框架引导这一循环:从文档、代码仓库及网络中获取基础知识和验证锚点,将其综合为可迁移技能,并基于这些锚点而非目标答案构建虚拟任务,从而在自我构建的虚拟任务中完善技能。因此,开放世界既提供了待学习的知识,也提供了一个独立于监督的练习环境,而目标任务监督仅保留用于最终评估。在三个基准测试和两个目标智能体上,OpenSkill在满足无监督约束的同时取得了最佳自动通过率。分析表明,其技能可在不同模型间迁移而无需针对特定模型进行调整,并且其自建的验证器与真实结果保持一致,尽管从未访问过这些结果。
English
Self-evolving agents requires adaptation after deployment, but existing approaches assume a usable learning loop, such as curated skills, successful trajectories, or verifier signals. Real open-world deployments may provide none of these, offering only a task prompt. In this work, we study open-world self-evolution, where an agent must build both its skills and its own verification signals from scratch, using open-world resources but no target-task supervision. We propose OpenSkill, a framework that bootstraps this loop: it acquires grounded knowledge and verification anchors from documentation, repositories, and the web, synthesizes them into transferable skills, and refines those skills against self-built virtual tasks grounded in the anchors rather than in target answers. The open world thus supplies both the knowledge to be learned and a supervision-independent practice environment, with target-task supervision reserved for final evaluation. Across three benchmarks and two target agents, OpenSkill attains the best automated pass rate while satisfying the no-supervision constraint. Analysis shows its skills transfer across models without model-specific adaptation, and its self-built verifier aligns with ground-truth outcomes despite never accessing them.