UI-Voyager：基于失败经验自进化的图形界面智能体学习系统

摘要

随着多模态大语言模型（MLLMs）的发展，自主移动图形用户界面（GUI）智能体日益受到关注。然而，现有方法仍存在两大挑战：难以从失败轨迹中高效学习，以及在长周期GUI任务中因奖励稀疏导致的信用分配模糊问题。为此，我们提出UI-Voyager——一种新型两阶段自演进移动GUI智能体。第一阶段采用拒绝微调（RFT）技术，实现数据与模型在全自动循环中的持续协同进化；第二阶段引入组相对自蒸馏（GRSD）方法，通过识别群体 rollout 中的关键决策分叉点，从成功轨迹构建密集的步骤级监督信号以修正失败轨迹。在AndroidWorld平台上的大量实验表明，我们的40亿参数模型实现了81.0%的Pass@1成功率，优于近期多个基线模型并超越人类水平。消融实验与案例研究进一步验证了GRSD的有效性。该方法无需昂贵的人工数据标注，为高效、自演进、高性能的移动GUI自动化实现了重大突破。

English

Autonomous mobile GUI agents have attracted increasing attention along with the advancement of Multimodal Large Language Models (MLLMs). However, existing methods still suffer from inefficient learning from failed trajectories and ambiguous credit assignment under sparse rewards for long-horizon GUI tasks. To that end, we propose UI-Voyager, a novel two-stage self-evolving mobile GUI agent. In the first stage, we employ Rejection Fine-Tuning (RFT), which enables the continuous co-evolution of data and models in a fully autonomous loop. The second stage introduces Group Relative Self-Distillation (GRSD), which identifies critical fork points in group rollouts and constructs dense step-level supervision from successful trajectories to correct failed ones. Extensive experiments on AndroidWorld show that our 4B model achieves an 81.0% Pass@1 success rate, outperforming numerous recent baselines and exceeding human-level performance. Ablation and case studies further verify the effectiveness of GRSD. Our method represents a significant leap toward efficient, self-evolving, and high-performance mobile GUI automation without expensive manual data annotation.