UI-Voyager: 失敗経験から学習する自己進化型GUIエージェント

要旨

自律型モバイルGUIエージェントは、マルチモーダル大規模言語モデル（MLLM）の発展に伴い、ますます注目を集めている。しかし、既存の手法では、失敗した軌跡からの学習効率の低さや、長期的なGUIタスクにおける疎な報酬下での曖昧な信用配分といった課題が依然として残っている。この問題に対処するため、我々は新しい二段階自己進化型モバイルGUIエージェント「UI-Voyager」を提案する。第一段階では、完全自律ループ内でデータとモデルの継続的共進化を実現するRejection Fine-Tuning（RFT）を採用する。第二段階では、グループロールアウトにおける重要な分岐点を特定し、成功軌跡から密なステップ単位の監督信号を構築して失敗軌跡を修正するGroup Relative Self-Distillation（GRSD）を導入する。AndroidWorldでの大規模実験により、我々の4Bパラメータモデルが81.0%のPass@1成功率を達成し、多数の最近のベースラインを上回り、人間レベルの性能を超えることを実証した。アブレーション研究とケーススタディは、GRSDの有効性をさらに裏付けている。本手法は、高価な手動データ注釈を必要としない、効率的で自己進化的かつ高性能なモバイルGUI自動化への重要な飛躍を意味する。

English

Autonomous mobile GUI agents have attracted increasing attention along with the advancement of Multimodal Large Language Models (MLLMs). However, existing methods still suffer from inefficient learning from failed trajectories and ambiguous credit assignment under sparse rewards for long-horizon GUI tasks. To that end, we propose UI-Voyager, a novel two-stage self-evolving mobile GUI agent. In the first stage, we employ Rejection Fine-Tuning (RFT), which enables the continuous co-evolution of data and models in a fully autonomous loop. The second stage introduces Group Relative Self-Distillation (GRSD), which identifies critical fork points in group rollouts and constructs dense step-level supervision from successful trajectories to correct failed ones. Extensive experiments on AndroidWorld show that our 4B model achieves an 81.0% Pass@1 success rate, outperforming numerous recent baselines and exceeding human-level performance. Ablation and case studies further verify the effectiveness of GRSD. Our method represents a significant leap toward efficient, self-evolving, and high-performance mobile GUI automation without expensive manual data annotation.

UI-Voyager: 失敗経験から学習する自己進化型GUIエージェント

UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

要旨

Support