ChatPaper.aiChatPaper

EvoCUA:透過可擴展合成經驗學習實現電腦使用代理的演化

EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience

January 22, 2026
作者: Taofeng Xue, Chong Peng, Mianqiu Huang, Linsen Guo, Tiancheng Han, Haozhe Wang, Jianing Wang, Xiaocheng Zhang, Xin Yang, Dengchang Zhao, Jinrui Ding, Xiandi Ma, Yuchen Xie, Peng Pei, Xunliang Cai, Xipeng Qiu
cs.AI

摘要

原生電腦使用代理(CUA)的發展標誌著多模態人工智慧的重大飛躍。然而,其潛力目前受制於靜態資料擴展的侷限性。現有範式主要依賴被動模仿靜態資料集,難以捕捉長時程電腦任務中固有的複雜因果動態。本研究提出EvoCUA——一種原生電腦使用代理模型。有別於靜態模仿,EvoCUA將資料生成與策略優化整合為自我維持的演化循環。為緩解資料稀缺問題,我們開發了可驗證合成引擎,能自主生成多樣化任務並配備可執行的驗證機制。為實現大規模經驗獲取,我們設計了可擴展基礎架構,可協調數萬個非同步沙箱推演。基於這些海量軌跡資料,我們提出迭代演化學習策略來有效內化經驗。該機制透過識別能力邊界動態調控策略更新——強化成功操作模式的同時,將失敗軌跡轉化為透過錯誤分析與自我修正產生的豐富監督信號。在OSWorld基準測試中的實證評估表明,EvoCUA達成56.7%的成功率,創下開源模型的新標竿。值得注意的是,EvoCUA顯著超越先前最佳開源模型OpenCUA-72B(45.0%),並勝過UI-TARS-2(53.1%)等領先的閉源權重模型。關鍵在於,我們的結果驗證了該方法的泛化能力:這種基於經驗學習驅動的演化範式,能在不同規模的基礎模型中產生持續效能提升,為推進原生代理能力開闢了穩健且可擴展的路徑。
English
The development of native computer-use agents (CUA) represents a significant leap in multimodal AI. However, their potential is currently bottlenecked by the constraints of static data scaling. Existing paradigms relying primarily on passive imitation of static datasets struggle to capture the intricate causal dynamics inherent in long-horizon computer tasks. In this work, we introduce EvoCUA, a native computer use agentic model. Unlike static imitation, EvoCUA integrates data generation and policy optimization into a self-sustaining evolutionary cycle. To mitigate data scarcity, we develop a verifiable synthesis engine that autonomously generates diverse tasks coupled with executable validators. To enable large-scale experience acquisition, we design a scalable infrastructure orchestrating tens of thousands of asynchronous sandbox rollouts. Building on these massive trajectories, we propose an iterative evolving learning strategy to efficiently internalize this experience. This mechanism dynamically regulates policy updates by identifying capability boundaries -- reinforcing successful routines while transforming failure trajectories into rich supervision through error analysis and self-correction. Empirical evaluations on the OSWorld benchmark demonstrate that EvoCUA achieves a success rate of 56.7%, establishing a new open-source state-of-the-art. Notably, EvoCUA significantly outperforms the previous best open-source model, OpenCUA-72B (45.0%), and surpasses leading closed-weights models such as UI-TARS-2 (53.1%). Crucially, our results underscore the generalizability of this approach: the evolving paradigm driven by learning from experience yields consistent performance gains across foundation models of varying scales, establishing a robust and scalable path for advancing native agent capabilities.
PDF621January 24, 2026