TreeCUA: 木構造による検証可能な進化を用いた効率的なGUI自動化のスケーリング

要旨

GUI自動化の効果的なスケーリングは、コンピュータ利用エージェント（CUA）にとって不可欠である。しかし、既存研究は主にGUIグラウンディングのスケーリングに焦点を当てており、より高度なデータ収集を必要とするGUIプランニングの重要性が見過ごされている。現実には、CUAによるアプリ/デスクトップ/ウェブページ横断的な探索プロセスは通常ツリー構造を辿り、初期の機能エントリーポイントほど高頻度で探索される傾向がある。したがって、大規模軌跡をツリー構造で整理することでデータコストを削減し、GUIプランニングのデータスケーリングを効率化できる。本研究では、ツリー構造化された検証可能な進化によりGUI自動化を効率的にスケールさせるTreeCUAを提案する。環境探索、アクション検証、軌跡要約、品質評価を実行するマルチエージェント協調フレームワークを構築し、高品質でスケーラブルなGUI軌跡を生成する。効率性向上のため、重複探索ノードを保存・再生する新しいツリーベーストポロジーを考案し、深さ（軌跡の難易度）と幅（軌跡の多様性）のバランスを取る適応的探索アルゴリズムを設計した。さらに、低品質生成を回避するための世界知識ガイダンスとグローバルメモリバックトラッキングを開発した。最後に、豊富なツリーノード情報から自然に拡張したTreeCUA-DPO法を提案し、隣接軌跡の分岐情報を参照することでGUIプランニング能力を向上させる。実験結果では、TreeCUAとTreeCUA-DPOが大幅な改善を示し、ドメイン外（OOD）評価でも強力な一般化性能を実証した。全ての軌跡ノード情報とコードはhttps://github.com/UITron-hub/TreeCUAで公開予定である。

English

Effectively scaling GUI automation is essential for computer-use agents (CUAs); however, existing work primarily focuses on scaling GUI grounding rather than the more crucial GUI planning, which requires more sophisticated data collection. In reality, the exploration process of a CUA across apps/desktops/web pages typically follows a tree structure, with earlier functional entry points often being explored more frequently. Thus, organizing large-scale trajectories into tree structures can reduce data cost and streamline the data scaling of GUI planning. In this work, we propose TreeCUA to efficiently scale GUI automation with tree-structured verifiable evolution. We propose a multi-agent collaborative framework to explore the environment, verify actions, summarize trajectories, and evaluate quality to generate high-quality and scalable GUI trajectories. To improve efficiency, we devise a novel tree-based topology to store and replay duplicate exploration nodes, and design an adaptive exploration algorithm to balance the depth (i.e., trajectory difficulty) and breadth (i.e., trajectory diversity). Moreover, we develop world knowledge guidance and global memory backtracking to avoid low-quality generation. Finally, we naturally extend and propose the TreeCUA-DPO method from abundant tree node information, improving GUI planning capability by referring to the branch information of adjacent trajectories. Experimental results show that TreeCUA and TreeCUA-DPO offer significant improvements, and out-of-domain (OOD) studies further demonstrate strong generalization. All trajectory node information and code will be available at https://github.com/UITron-hub/TreeCUA.

TreeCUA: 木構造による検証可能な進化を用いた効率的なGUI自動化のスケーリング

TreeCUA: Efficiently Scaling GUI Automation with Tree-Structured Verifiable Evolution

要旨

Support