UI-Genie：一種自我提升方法，用於迭代增強基於MLLM的行動GUI代理

摘要

本文介绍了UI-Genie，一种自我优化的框架，旨在解决图形用户界面（GUI）代理中的两大关键挑战：轨迹结果的验证难度大以及高质量训练数据的可扩展性不足。这些挑战分别通过奖励模型和自我优化流程得以应对。奖励模型UI-Genie-RM采用图像文本交错架构，高效处理历史上下文，并统一了动作级别与任务级别的奖励。为支持UI-Genie-RM的训练，我们开发了精心设计的数据生成策略，包括基于规则的验证、受控轨迹破坏及困难负样本挖掘。针对第二个挑战，自我优化流程通过奖励引导的探索和动态环境中的结果验证，逐步扩展可解决的复杂GUI任务，同时提升代理与奖励模型。为模型训练，我们生成了UI-Genie-RM-517k和UI-Genie-Agent-16k数据集，建立了首个专为GUI代理设计的奖励特定数据集，并展示了无需人工标注即可生成高质量合成轨迹的能力。实验结果表明，UI-Genie在多个GUI代理基准测试中实现了最先进的性能，历经三代数据模型的自我优化。我们开源了完整的框架实现及生成的数据集，以促进进一步研究，详见https://github.com/Euphoria16/UI-Genie。

English

In this paper, we introduce UI-Genie, a self-improving framework addressing two key challenges in GUI agents: verification of trajectory outcome is challenging and high-quality training data are not scalable. These challenges are addressed by a reward model and a self-improving pipeline, respectively. The reward model, UI-Genie-RM, features an image-text interleaved architecture that efficiently pro- cesses historical context and unifies action-level and task-level rewards. To sup- port the training of UI-Genie-RM, we develop deliberately-designed data genera- tion strategies including rule-based verification, controlled trajectory corruption, and hard negative mining. To address the second challenge, a self-improvement pipeline progressively expands solvable complex GUI tasks by enhancing both the agent and reward models through reward-guided exploration and outcome verification in dynamic environments. For training the model, we generate UI- Genie-RM-517k and UI-Genie-Agent-16k, establishing the first reward-specific dataset for GUI agents while demonstrating high-quality synthetic trajectory gen- eration without manual annotation. Experimental results show that UI-Genie achieves state-of-the-art performance across multiple GUI agent benchmarks with three generations of data-model self-improvement. We open-source our complete framework implementation and generated datasets to facilitate further research in https://github.com/Euphoria16/UI-Genie.

UI-Genie：一種自我提升方法，用於迭代增強基於MLLM的行動GUI代理

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

摘要

Support