SEAgent：具备经验自主学习的自我进化计算机使用代理

摘要

将大规模视觉语言模型（LVLMs）重新定位为计算机使用代理（CUAs）已带来重大突破，这主要得益于人工标注数据的驱动。然而，这些模型在面对新颖且专业化的软件时往往表现不佳，尤其是在缺乏人工标注的场景中。为应对这一挑战，我们提出了SEAgent，一个自进化代理框架，使CUAs能够通过与陌生软件的交互实现自主进化。具体而言，SEAgent赋予计算机使用代理通过经验学习自主掌握新软件环境的能力，在此过程中，代理探索新软件，通过迭代试错学习，并逐步解决从简单到复杂自动生成的任务。为实现这一目标，我们设计了一个用于逐步轨迹评估的世界状态模型，以及一个生成日益多样化和挑战性任务的课程生成器。代理的策略通过经验学习进行更新，包括对失败动作的对抗性模仿和对成功动作的群体相对策略优化（GRPO）。此外，我们引入了一种从专家到通才的训练策略，整合来自专家代理的个体经验洞察，促进能够持续自主进化的更强通才CUA的发展。这一统一代理最终在各自专业软件上的表现超越了单个专家代理的集合。我们在OS-World中的五个新软件环境中验证了SEAgent的有效性。相较于竞争性开源CUA，即UI-TARS，我们的方法在成功率上实现了从11.3%到34.5%的显著提升，提高了23.2%。

English

Repurposing large vision-language models (LVLMs) as computer use agents (CUAs) has led to substantial breakthroughs, primarily driven by human-labeled data. However, these models often struggle with novel and specialized software, particularly in scenarios lacking human annotations. To address this challenge, we propose SEAgent, an agentic self-evolving framework enabling CUAs to autonomously evolve through interactions with unfamiliar software. Specifically, SEAgent empowers computer-use agents to autonomously master novel software environments via experiential learning, where agents explore new software, learn through iterative trial-and-error, and progressively tackle auto-generated tasks organized from simple to complex. To achieve this goal, we design a World State Model for step-wise trajectory assessment, along with a Curriculum Generator that generates increasingly diverse and challenging tasks. The agent's policy is updated through experiential learning, comprised of adversarial imitation of failure actions and Group Relative Policy Optimization (GRPO) on successful ones. Furthermore, we introduce a specialist-to-generalist training strategy that integrates individual experiential insights from specialist agents, facilitating the development of a stronger generalist CUA capable of continuous autonomous evolution. This unified agent ultimately achieves performance surpassing ensembles of individual specialist agents on their specialized software. We validate the effectiveness of SEAgent across five novel software environments within OS-World. Our approach achieves a significant improvement of 23.2% in success rate, from 11.3% to 34.5%, over a competitive open-source CUA, i.e., UI-TARS.

SEAgent：具备经验自主学习的自我进化计算机使用代理

SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience

摘要

Support