SEAgent：基於經驗自主學習的自我進化計算機使用代理

摘要

將大型視覺語言模型（LVLMs）重新定位為計算機使用代理（CUAs）已帶來重大突破，這主要得益於人類標註的數據。然而，這些模型在面對新穎且專業的軟件時往往表現不佳，尤其是在缺乏人工註解的場景中。為應對這一挑戰，我們提出了SEAgent，這是一個使CUAs能夠通過與陌生軟件的交互自主進化的代理自演化框架。具體而言，SEAgent賦予計算機使用代理通過經驗學習自主掌握新軟件環境的能力，其中代理探索新軟件，通過迭代試錯學習，並逐步解決從簡單到複雜自動生成的任務。為實現這一目標，我們設計了一個用於逐步軌跡評估的世界狀態模型，以及一個生成日益多樣化和挑戰性任務的課程生成器。代理的策略通過經驗學習進行更新，包括對失敗動作的對抗性模仿和對成功動作的群體相對策略優化（GRPO）。此外，我們引入了一種從專家到通才的訓練策略，該策略整合了來自專家代理的個體經驗見解，促進了能夠持續自主進化的更強大通才CUA的發展。這一統一代理最終在其專業軟件上的表現超越了單個專家代理的集合。我們在OS-World中的五個新軟件環境中驗證了SEAgent的有效性。與競爭性的開源CUA（即UI-TARS）相比，我們的方法在成功率上實現了顯著的23.2%提升，從11.3%增至34.5%。

English

Repurposing large vision-language models (LVLMs) as computer use agents (CUAs) has led to substantial breakthroughs, primarily driven by human-labeled data. However, these models often struggle with novel and specialized software, particularly in scenarios lacking human annotations. To address this challenge, we propose SEAgent, an agentic self-evolving framework enabling CUAs to autonomously evolve through interactions with unfamiliar software. Specifically, SEAgent empowers computer-use agents to autonomously master novel software environments via experiential learning, where agents explore new software, learn through iterative trial-and-error, and progressively tackle auto-generated tasks organized from simple to complex. To achieve this goal, we design a World State Model for step-wise trajectory assessment, along with a Curriculum Generator that generates increasingly diverse and challenging tasks. The agent's policy is updated through experiential learning, comprised of adversarial imitation of failure actions and Group Relative Policy Optimization (GRPO) on successful ones. Furthermore, we introduce a specialist-to-generalist training strategy that integrates individual experiential insights from specialist agents, facilitating the development of a stronger generalist CUA capable of continuous autonomous evolution. This unified agent ultimately achieves performance surpassing ensembles of individual specialist agents on their specialized software. We validate the effectiveness of SEAgent across five novel software environments within OS-World. Our approach achieves a significant improvement of 23.2% in success rate, from 11.3% to 34.5%, over a competitive open-source CUA, i.e., UI-TARS.

SEAgent：基於經驗自主學習的自我進化計算機使用代理

SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience

摘要

Support