ChatPaper.aiChatPaper

UI-Genie:一種自我提升方法,用於迭代增強基於MLLM的行動GUI代理

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

May 27, 2025
作者: Han Xiao, Guozhi Wang, Yuxiang Chai, Zimu Lu, Weifeng Lin, Hao He, Lue Fan, Liuyang Bian, Rui Hu, Liang Liu, Shuai Ren, Yafei Wen, Xiaoxin Chen, Aojun Zhou, Hongsheng Li
cs.AI

摘要

本文介绍了UI-Genie,一种自我优化的框架,旨在解决图形用户界面(GUI)代理中的两大关键挑战:轨迹结果的验证难度大以及高质量训练数据的可扩展性不足。这些挑战分别通过奖励模型和自我优化流程得以应对。奖励模型UI-Genie-RM采用图像文本交错架构,高效处理历史上下文,并统一了动作级别与任务级别的奖励。为支持UI-Genie-RM的训练,我们开发了精心设计的数据生成策略,包括基于规则的验证、受控轨迹破坏及困难负样本挖掘。针对第二个挑战,自我优化流程通过奖励引导的探索和动态环境中的结果验证,逐步扩展可解决的复杂GUI任务,同时提升代理与奖励模型。为模型训练,我们生成了UI-Genie-RM-517k和UI-Genie-Agent-16k数据集,建立了首个专为GUI代理设计的奖励特定数据集,并展示了无需人工标注即可生成高质量合成轨迹的能力。实验结果表明,UI-Genie在多个GUI代理基准测试中实现了最先进的性能,历经三代数据模型的自我优化。我们开源了完整的框架实现及生成的数据集,以促进进一步研究,详见https://github.com/Euphoria16/UI-Genie。
English
In this paper, we introduce UI-Genie, a self-improving framework addressing two key challenges in GUI agents: verification of trajectory outcome is challenging and high-quality training data are not scalable. These challenges are addressed by a reward model and a self-improving pipeline, respectively. The reward model, UI-Genie-RM, features an image-text interleaved architecture that efficiently pro- cesses historical context and unifies action-level and task-level rewards. To sup- port the training of UI-Genie-RM, we develop deliberately-designed data genera- tion strategies including rule-based verification, controlled trajectory corruption, and hard negative mining. To address the second challenge, a self-improvement pipeline progressively expands solvable complex GUI tasks by enhancing both the agent and reward models through reward-guided exploration and outcome verification in dynamic environments. For training the model, we generate UI- Genie-RM-517k and UI-Genie-Agent-16k, establishing the first reward-specific dataset for GUI agents while demonstrating high-quality synthetic trajectory gen- eration without manual annotation. Experimental results show that UI-Genie achieves state-of-the-art performance across multiple GUI agent benchmarks with three generations of data-model self-improvement. We open-source our complete framework implementation and generated datasets to facilitate further research in https://github.com/Euphoria16/UI-Genie.

Summary

AI-Generated Summary

PDF381May 28, 2025