UI-Voyager：基於失敗經驗自我演進的圖形介面代理學習系統

摘要

隨著多模態大型語言模型（MLLMs）的發展，自主移動圖形用戶界面代理日益受到關注。然而，現有方法仍存在兩大侷限：對失敗軌跡的學習效率低下，以及在長週期GUI任務中稀疏獎勵下的模糊功勞分配問題。為此，我們提出UI-Voyager——一種新型兩階段自進化移動GUI代理。第一階段採用拒絕微調技術，實現數據與模型在完全自主循環中的持續協同進化；第二階段引入群組相對自蒸餾方法，通過識別群組推演中的關鍵決策點，從成功軌跡構建密集的步驟級監督信號以修正失敗軌跡。在AndroidWorld平臺上的大量實驗表明，我們的40億參數模型達到了81.0%的Pass@1成功率，不僅優於多個近期基線模型，更超越了人類水平。消融實驗與案例研究進一步驗證了群組相對自蒸餾的有效性。該方法標誌著我們在不依賴昂貴人工數據標註的前提下，向高效、自進化、高性能的移動GUI自動化邁出了重要一步。

English

Autonomous mobile GUI agents have attracted increasing attention along with the advancement of Multimodal Large Language Models (MLLMs). However, existing methods still suffer from inefficient learning from failed trajectories and ambiguous credit assignment under sparse rewards for long-horizon GUI tasks. To that end, we propose UI-Voyager, a novel two-stage self-evolving mobile GUI agent. In the first stage, we employ Rejection Fine-Tuning (RFT), which enables the continuous co-evolution of data and models in a fully autonomous loop. The second stage introduces Group Relative Self-Distillation (GRSD), which identifies critical fork points in group rollouts and constructs dense step-level supervision from successful trajectories to correct failed ones. Extensive experiments on AndroidWorld show that our 4B model achieves an 81.0% Pass@1 success rate, outperforming numerous recent baselines and exceeding human-level performance. Ablation and case studies further verify the effectiveness of GRSD. Our method represents a significant leap toward efficient, self-evolving, and high-performance mobile GUI automation without expensive manual data annotation.

UI-Voyager：基於失敗經驗自我演進的圖形介面代理學習系統

UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

摘要

Support