InfiGUI-G1: 適応的探索ポリシー最適化によるGUIグラウンディングの進化

要旨

マルチモーダル大規模言語モデル（MLLM）の出現により、純粋な視覚入力を用いてグラフィカルユーザーインターフェース（GUI）上で動作する自律エージェントの開発が加速しています。ここで根本的な課題となるのは、自然言語の指示を確実に基盤づけることです。これには、各要素の座標を正確に特定する空間的アラインメントと、さらに重要な、指示を機能的に適切なUI要素にマッチングする意味的アラインメントが必要です。検証可能な報酬を用いた強化学習（RLVR）は、これらのMLLMの空間的アラインメントを改善するのに有効であることが証明されていますが、非効率的な探索が意味的アラインメントのボトルネックとなり、モデルが難しい意味的関連性を学習するのを妨げていることがわかりました。この探索問題に対処するため、我々は新しいポリシー最適化フレームワークである適応的探索ポリシー最適化（AEPO）を提案します。AEPOは、効率性の第一原理η=U/Cから導出された理論的に根拠のある適応的探索報酬（AER）関数によって導かれる、より広範な探索を強制する多回答生成戦略を採用しています。我々のAEPOで訓練されたモデル、InfiGUI-G1-3BとInfiGUI-G1-7Bは、複数の挑戦的なGUI基盤づけベンチマークで新たな最先端の結果を達成し、一般化と意味理解をテストするために設計されたベンチマークにおいて、ナイーブなRLVRベースラインに対して最大9.0%の相対的な改善を示しました。リソースはhttps://github.com/InfiXAI/InfiGUI-G1で利用可能です。

English

The emergence of Multimodal Large Language Models (MLLMs) has propelled the development of autonomous agents that operate on Graphical User Interfaces (GUIs) using pure visual input. A fundamental challenge is robustly grounding natural language instructions. This requires a precise spatial alignment, which accurately locates the coordinates of each element, and, more critically, a correct semantic alignment, which matches the instructions to the functionally appropriate UI element. Although Reinforcement Learning with Verifiable Rewards (RLVR) has proven to be effective at improving spatial alignment for these MLLMs, we find that inefficient exploration bottlenecks semantic alignment, which prevent models from learning difficult semantic associations. To address this exploration problem, we present Adaptive Exploration Policy Optimization (AEPO), a new policy optimization framework. AEPO employs a multi-answer generation strategy to enforce broader exploration, which is then guided by a theoretically grounded Adaptive Exploration Reward (AER) function derived from first principles of efficiency eta=U/C. Our AEPO-trained models, InfiGUI-G1-3B and InfiGUI-G1-7B, establish new state-of-the-art results across multiple challenging GUI grounding benchmarks, achieving significant relative improvements of up to 9.0% against the naive RLVR baseline on benchmarks designed to test generalization and semantic understanding. Resources are available at https://github.com/InfiXAI/InfiGUI-G1.

InfiGUI-G1: 適応的探索ポリシー最適化によるGUIグラウンディングの進化

InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization

要旨

Support