UI-Voyager: Een zelf-evoluerende GUI-agent die leert via mislukte ervaringen

Samenvatting

Autonome mobiele GUI-agenten hebben steeds meer aandacht gekregen dankzij de vooruitgang in Multimodale Large Language Models (MLLM's). Bestaande methoden kampen echter nog steeds met inefficiënt leren van mislukte trajecten en onduidelijke toerekening van credits onder schaarse beloningen voor langetermijn GUI-taken. Daartoe stellen wij UI-Voyager voor, een nieuwe, tweefasige, zelf-evoluerende mobiele GUI-agent. In de eerste fase passen wij Rejection Fine-Tuning (RFT) toe, wat de continue co-evolutie van data en modellen in een volledig autonome lus mogelijk maakt. De tweede fase introduceert Group Relative Self-Distillation (GRSD), dat kritieke keuzepunten in groep rollouts identificeert en dichte, stap-voor-stap supervisie construeert vanuit succesvolle trajecten om mislukte trajecten te corrigeren. Uitgebreide experimenten op AndroidWorld tonen aan dat ons 4B-model een slagingspercentage van 81,0% voor Pass@1 bereikt, wat tal van recente baseline-methoden overtreft en de menselijke prestatieniveau overschrijdt. Ablatie- en casestudies bevestigen verder de effectiviteit van GRSD. Onze methode vertegenwoordigt een significante sprong voorwaarts richting efficiënte, zelf-evoluerende en hoogwaardige mobiele GUI-automatisering zonder dure handmatige data-annotatie.

English

Autonomous mobile GUI agents have attracted increasing attention along with the advancement of Multimodal Large Language Models (MLLMs). However, existing methods still suffer from inefficient learning from failed trajectories and ambiguous credit assignment under sparse rewards for long-horizon GUI tasks. To that end, we propose UI-Voyager, a novel two-stage self-evolving mobile GUI agent. In the first stage, we employ Rejection Fine-Tuning (RFT), which enables the continuous co-evolution of data and models in a fully autonomous loop. The second stage introduces Group Relative Self-Distillation (GRSD), which identifies critical fork points in group rollouts and constructs dense step-level supervision from successful trajectories to correct failed ones. Extensive experiments on AndroidWorld show that our 4B model achieves an 81.0% Pass@1 success rate, outperforming numerous recent baselines and exceeding human-level performance. Ablation and case studies further verify the effectiveness of GRSD. Our method represents a significant leap toward efficient, self-evolving, and high-performance mobile GUI automation without expensive manual data annotation.

UI-Voyager: Een zelf-evoluerende GUI-agent die leert via mislukte ervaringen

UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

Samenvatting

Support