UI-Genie: MLLMベースのモバイルGUIエージェントを反復的に強化するための自己改善アプローチ

要旨

本論文では、UI-Genieを紹介する。これは、GUIエージェントにおける2つの主要な課題、すなわち軌跡結果の検証が困難であることと、高品質なトレーニングデータがスケーラブルでないことに対処する自己改善フレームワークである。これらの課題は、それぞれ報酬モデルと自己改善パイプラインによって解決される。報酬モデルであるUI-Genie-RMは、画像とテキストを交互に処理するアーキテクチャを特徴とし、履歴コンテキストを効率的に処理し、アクションレベルとタスクレベルの報酬を統合する。UI-Genie-RMのトレーニングを支援するために、ルールベースの検証、制御された軌跡の破損、ハードネガティブマイニングを含む意図的に設計されたデータ生成戦略を開発した。2つ目の課題に対処するために、自己改善パイプラインは、動的環境における報酬誘導探索と結果検証を通じて、エージェントと報酬モデルの両方を強化し、解決可能な複雑なGUIタスクを段階的に拡張する。モデルのトレーニングのために、UI-Genie-RM-517kとUI-Genie-Agent-16kを生成し、GUIエージェント向けの初の報酬特化データセットを確立するとともに、手動アノテーションなしで高品質な合成軌跡生成を示す。実験結果は、UI-Genieが3世代のデータモデル自己改善を通じて、複数のGUIエージェントベンチマークで最先端のパフォーマンスを達成することを示している。我々は、さらなる研究を促進するために、完全なフレームワーク実装と生成されたデータセットをhttps://github.com/Euphoria16/UI-Genieでオープンソースとして公開する。

English

In this paper, we introduce UI-Genie, a self-improving framework addressing two key challenges in GUI agents: verification of trajectory outcome is challenging and high-quality training data are not scalable. These challenges are addressed by a reward model and a self-improving pipeline, respectively. The reward model, UI-Genie-RM, features an image-text interleaved architecture that efficiently pro- cesses historical context and unifies action-level and task-level rewards. To sup- port the training of UI-Genie-RM, we develop deliberately-designed data genera- tion strategies including rule-based verification, controlled trajectory corruption, and hard negative mining. To address the second challenge, a self-improvement pipeline progressively expands solvable complex GUI tasks by enhancing both the agent and reward models through reward-guided exploration and outcome verification in dynamic environments. For training the model, we generate UI- Genie-RM-517k and UI-Genie-Agent-16k, establishing the first reward-specific dataset for GUI agents while demonstrating high-quality synthetic trajectory gen- eration without manual annotation. Experimental results show that UI-Genie achieves state-of-the-art performance across multiple GUI agent benchmarks with three generations of data-model self-improvement. We open-source our complete framework implementation and generated datasets to facilitate further research in https://github.com/Euphoria16/UI-Genie.

UI-Genie: MLLMベースのモバイルGUIエージェントを反復的に強化するための自己改善アプローチ

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

要旨

Support