UI-Genie: MLLM 기반 모바일 GUI 에이전트의 반복적 성능 향상을 위한 자기 개선 접근법

초록

본 논문에서는 GUI 에이전트의 두 가지 주요 과제, 즉 궤적 결과 검증의 어려움과 고품질 학습 데이터의 확장성 부족을 해결하기 위한 자가 개선 프레임워크인 UI-Genie를 소개한다. 이러한 과제는 각각 보상 모델과 자가 개선 파이프라인을 통해 해결된다. 보상 모델인 UI-Genie-RM은 이미지-텍스트 인터리브 구조를 특징으로 하며, 역사적 컨텍스트를 효율적으로 처리하고 행동 수준과 작업 수준의 보상을 통합한다. UI-Genie-RM의 학습을 지원하기 위해, 규칙 기반 검증, 제어된 궤적 손상, 그리고 하드 네거티브 마이닝과 같은 의도적으로 설계된 데이터 생성 전략을 개발하였다. 두 번째 과제를 해결하기 위해, 자가 개선 파이프라인은 보안 탐색과 동적 환경에서의 결과 검증을 통해 에이전트와 보상 모델을 점진적으로 향상시켜 해결 가능한 복잡한 GUI 작업을 확장한다. 모델 학습을 위해 UI-Genie-RM-517k와 UI-Genie-Agent-16k 데이터셋을 생성하였으며, 이는 GUI 에이전트를 위한 최초의 보상 특화 데이터셋을 구축하면서 수동 주석 없이도 고품질의 합성 궤적 생성을 입증한다. 실험 결과, UI-Genie는 세 세대의 데이터-모델 자가 개선을 통해 여러 GUI 에이전트 벤치마크에서 최첨단 성능을 달성함을 보여준다. 본 연구는 추가 연구를 촉진하기 위해 전체 프레임워크 구현과 생성된 데이터셋을 https://github.com/Euphoria16/UI-Genie에서 공개한다.

English

In this paper, we introduce UI-Genie, a self-improving framework addressing two key challenges in GUI agents: verification of trajectory outcome is challenging and high-quality training data are not scalable. These challenges are addressed by a reward model and a self-improving pipeline, respectively. The reward model, UI-Genie-RM, features an image-text interleaved architecture that efficiently pro- cesses historical context and unifies action-level and task-level rewards. To sup- port the training of UI-Genie-RM, we develop deliberately-designed data genera- tion strategies including rule-based verification, controlled trajectory corruption, and hard negative mining. To address the second challenge, a self-improvement pipeline progressively expands solvable complex GUI tasks by enhancing both the agent and reward models through reward-guided exploration and outcome verification in dynamic environments. For training the model, we generate UI- Genie-RM-517k and UI-Genie-Agent-16k, establishing the first reward-specific dataset for GUI agents while demonstrating high-quality synthetic trajectory gen- eration without manual annotation. Experimental results show that UI-Genie achieves state-of-the-art performance across multiple GUI agent benchmarks with three generations of data-model self-improvement. We open-source our complete framework implementation and generated datasets to facilitate further research in https://github.com/Euphoria16/UI-Genie.

UI-Genie: MLLM 기반 모바일 GUI 에이전트의 반복적 성능 향상을 위한 자기 개선 접근법

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

초록

Support