ChatPaper.aiChatPaper

TreeCUA:基于树形可验证演化结构的高效GUI自动化扩展方案

TreeCUA: Efficiently Scaling GUI Automation with Tree-Structured Verifiable Evolution

February 10, 2026
作者: Deyang Jiang, Jing Huang, Xuanle Zhao, Lei Chen, Liming Zheng, Fanfan Liu, Haibo Qiu, Peng Shi, Zhixiong Zeng
cs.AI

摘要

有效扩展图形用户界面(GUI)自动化对计算机使用智能体(CUAs)至关重要,但现有研究主要聚焦于GUI基础定位的扩展,而非更需要复杂数据收集的GUI规划环节。实际上,CUA在跨应用/桌面/网页的探索过程通常呈现树状结构,早期功能入口点往往被更频繁访问。因此将大规模操作轨迹组织为树结构,既能降低数据成本,又可简化GUI规划的数据扩展。本文提出TreeCUA系统,通过树状可验证演化实现高效GUI自动化扩展。我们设计多智能体协作框架,通过环境探索、动作验证、轨迹总结和质量评估来生成高质量可扩展的GUI轨迹。为提升效率,创新性地采用基于树的拓扑结构存储和回放重复探索节点,并设计自适应探索算法平衡深度(即轨迹难度)与广度(即轨迹多样性)。此外,开发世界知识引导和全局记忆回溯机制以避免低质量生成。最后基于丰富树节点信息自然延伸出TreeCUA-DPO方法,通过参考相邻轨迹的分支信息提升GUI规划能力。实验表明TreeCUA与TreeCUA-DPO均取得显著提升,域外(OOD)研究进一步验证了强泛化能力。所有轨迹节点信息与代码将在https://github.com/UITron-hub/TreeCUA 开源。
English
Effectively scaling GUI automation is essential for computer-use agents (CUAs); however, existing work primarily focuses on scaling GUI grounding rather than the more crucial GUI planning, which requires more sophisticated data collection. In reality, the exploration process of a CUA across apps/desktops/web pages typically follows a tree structure, with earlier functional entry points often being explored more frequently. Thus, organizing large-scale trajectories into tree structures can reduce data cost and streamline the data scaling of GUI planning. In this work, we propose TreeCUA to efficiently scale GUI automation with tree-structured verifiable evolution. We propose a multi-agent collaborative framework to explore the environment, verify actions, summarize trajectories, and evaluate quality to generate high-quality and scalable GUI trajectories. To improve efficiency, we devise a novel tree-based topology to store and replay duplicate exploration nodes, and design an adaptive exploration algorithm to balance the depth (i.e., trajectory difficulty) and breadth (i.e., trajectory diversity). Moreover, we develop world knowledge guidance and global memory backtracking to avoid low-quality generation. Finally, we naturally extend and propose the TreeCUA-DPO method from abundant tree node information, improving GUI planning capability by referring to the branch information of adjacent trajectories. Experimental results show that TreeCUA and TreeCUA-DPO offer significant improvements, and out-of-domain (OOD) studies further demonstrate strong generalization. All trajectory node information and code will be available at https://github.com/UITron-hub/TreeCUA.
PDF51February 12, 2026