UI-Genie：一种基于多模态大语言模型的移动GUI代理迭代增强自优化方法

摘要

本文介绍了UI-Genie，一种自我优化的框架，旨在解决图形用户界面（GUI）代理中的两大关键挑战：轨迹结果验证困难以及高质量训练数据难以规模化。针对这些挑战，UI-Genie分别通过奖励模型和自我优化流程予以应对。其中，奖励模型UI-Genie-RM采用图像与文本交织的架构，高效处理历史上下文信息，并统一了动作级别与任务级别的奖励机制。为支持UI-Genie-RM的训练，我们开发了精心设计的数据生成策略，包括基于规则的验证、受控轨迹破坏及困难负样本挖掘。针对第二个挑战，自我优化流程通过奖励引导的探索与动态环境中的结果验证，逐步扩展可解决的复杂GUI任务，同时提升代理与奖励模型的能力。为模型训练，我们生成了UI-Genie-RM-517k和UI-Genie-Agent-16k数据集，首次为GUI代理建立了专门的奖励数据集，并展示了无需人工标注即可生成高质量合成轨迹的能力。实验结果表明，UI-Genie在历经三代数据模型自我优化后，在多个GUI代理基准测试中均达到了最先进的性能水平。我们开源了完整的框架实现及生成的数据集，以促进进一步研究，详见https://github.com/Euphoria16/UI-Genie。

English

In this paper, we introduce UI-Genie, a self-improving framework addressing two key challenges in GUI agents: verification of trajectory outcome is challenging and high-quality training data are not scalable. These challenges are addressed by a reward model and a self-improving pipeline, respectively. The reward model, UI-Genie-RM, features an image-text interleaved architecture that efficiently pro- cesses historical context and unifies action-level and task-level rewards. To sup- port the training of UI-Genie-RM, we develop deliberately-designed data genera- tion strategies including rule-based verification, controlled trajectory corruption, and hard negative mining. To address the second challenge, a self-improvement pipeline progressively expands solvable complex GUI tasks by enhancing both the agent and reward models through reward-guided exploration and outcome verification in dynamic environments. For training the model, we generate UI- Genie-RM-517k and UI-Genie-Agent-16k, establishing the first reward-specific dataset for GUI agents while demonstrating high-quality synthetic trajectory gen- eration without manual annotation. Experimental results show that UI-Genie achieves state-of-the-art performance across multiple GUI agent benchmarks with three generations of data-model self-improvement. We open-source our complete framework implementation and generated datasets to facilitate further research in https://github.com/Euphoria16/UI-Genie.

UI-Genie：一种基于多模态大语言模型的移动GUI代理迭代增强自优化方法

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

摘要

Support