TreeCUA:基于树形可验证演化结构的高效GUI自动化扩展方案
TreeCUA: Efficiently Scaling GUI Automation with Tree-Structured Verifiable Evolution
February 10, 2026
作者: Deyang Jiang, Jing Huang, Xuanle Zhao, Lei Chen, Liming Zheng, Fanfan Liu, Haibo Qiu, Peng Shi, Zhixiong Zeng
cs.AI
摘要
有效扩展图形用户界面自动化对计算机使用代理至关重要,但现有工作主要聚焦于界面定位的扩展,而非更需要复杂数据收集的界面规划。现实中,代理在跨应用/桌面/网页的探索过程通常呈现树状结构,早期功能入口点往往被更频繁探索。因此将大规模轨迹组织为树状结构可降低数据成本,并优化界面规划的数据扩展。本文提出TreeCUA框架,通过树状可验证演化实现高效界面自动化扩展。我们设计多智能体协同框架,通过环境探索、动作验证、轨迹总结和质量评估来生成高质量可扩展的界面轨迹。为提升效率,创新性地采用基于树的拓扑结构存储并复现重复探索节点,并设计自适应探索算法平衡深度(即轨迹难度)与广度(即轨迹多样性)。此外,通过世界知识引导和全局记忆回溯机制避免低质量生成。最后基于丰富树节点信息自然延伸出TreeCUA-DPO方法,通过参考相邻轨迹的分支信息提升界面规划能力。实验表明TreeCUA与TreeCUA-DPO均取得显著提升,域外研究进一步验证了强泛化能力。所有轨迹节点信息与代码将发布于https://github.com/UITron-hub/TreeCUA。
English
Effectively scaling GUI automation is essential for computer-use agents (CUAs); however, existing work primarily focuses on scaling GUI grounding rather than the more crucial GUI planning, which requires more sophisticated data collection. In reality, the exploration process of a CUA across apps/desktops/web pages typically follows a tree structure, with earlier functional entry points often being explored more frequently. Thus, organizing large-scale trajectories into tree structures can reduce data cost and streamline the data scaling of GUI planning. In this work, we propose TreeCUA to efficiently scale GUI automation with tree-structured verifiable evolution. We propose a multi-agent collaborative framework to explore the environment, verify actions, summarize trajectories, and evaluate quality to generate high-quality and scalable GUI trajectories. To improve efficiency, we devise a novel tree-based topology to store and replay duplicate exploration nodes, and design an adaptive exploration algorithm to balance the depth (i.e., trajectory difficulty) and breadth (i.e., trajectory diversity). Moreover, we develop world knowledge guidance and global memory backtracking to avoid low-quality generation. Finally, we naturally extend and propose the TreeCUA-DPO method from abundant tree node information, improving GUI planning capability by referring to the branch information of adjacent trajectories. Experimental results show that TreeCUA and TreeCUA-DPO offer significant improvements, and out-of-domain (OOD) studies further demonstrate strong generalization. All trajectory node information and code will be available at https://github.com/UITron-hub/TreeCUA.