MobileForge: 계층적 피드백 유도 정책 최적화를 통한 모바일 GUI 에이전트의 어노테이션 없는 적응

초록

MLLM 기반 모바일 GUI 에이전트는 UI 이해 및 액션 실행에서 상당한 진전을 이루었지만, 실제 대상 앱에 적용하는 데는 여전히 높은 비용이 소요된다. 이는 모바일 앱이 다양하고 자주 업데이트되며, 사람이 작성한 작업, 시연 데이터 또는 보상 레이블로 모든 앱을 포괄하기 어렵기 때문이다. 기존의 주석 없는 GUI 학습은 수동 감독을 줄여주지만, 대상 앱 탐색, 커리큘럼 마이닝, 롤아웃 실행, 피드백을 연결하는 통합 기반(substrate)이 부재하며, 정책 최적화는 종종 독립적인 롤아웃과 신뢰할 수 있는 개선 신호로 변환하기 어려운 조악한 보상에 의존한다. 본 논문에서는 모바일 GUI 에이전트를 위한 주석 없는 적응 시스템인 MobileForge를 제안한다. MobileForge는 실제 모바일 앱 상호작용에서 작업 생성 및 롤아웃 평가를 기반으로 하는 MobileGym과, 궤적 결과, 단계별 프로세스 피드백 및 수정 힌트를 힌트-맥락화된 단계별 GRPO 업데이트로 변환하는 계층적 피드백 유도 정책 최적화(Hierarchical Feedback-Guided Policy Optimization, HiFPO)로 구성된다. 자동 생성된 주석 없는 적응 데이터만을 사용하여, MobileForge는 Qwen3-VL-8B를 AndroidWorld에서 67.2% Pass@3로 적응시켰으며, 이는 비공개 데이터로 학습된 GUI 특화 GUI-Owl-1.5-8B 기본 모델의 69.0%에 근접한 수치이다. MobileForge로 적응된 ForgeOwl-8B는 AndroidWorld에서 77.6% Pass@3, 도메인 외부(out-of-domain) MobileWorld GUI 전용 분할에서 41.0% 성공률을 달성하여, 본 평가에서 가장 강력한 공개 데이터 기반 모바일 GUI 에이전트를 확립하였다. 코드, 데이터 및 학습된 모델은 https://mobile-forge.github.io/에서 공개될 예정이다.

English

MLLM-based mobile GUI agents have made substantial progress in UI understanding and action execution, but adapting them to real target apps remains costly because mobile apps are numerous, frequently updated, and hard to cover with human-written tasks, demonstrations, or reward labels. Existing annotation-free GUI learning reduces manual supervision, yet lacks a unified substrate connecting target-app exploration, curriculum mining, rollout execution, and feedback, while policy optimization often relies on isolated rollouts and coarse rewards that are hard to convert into reliable improvement signals. We present MobileForge, an annotation-free adaptation system for mobile GUI agents. MobileForge consists of MobileGym, which grounds task generation and rollout evaluation in real mobile app interaction, and Hierarchical Feedback-Guided Policy Optimization (HiFPO), which turns trajectory outcomes, step-level process feedback, and corrective hints into hint-contextualized step-level GRPO updates. Using only automatically generated annotation-free adaptation data, MobileForge adapts Qwen3-VL-8B to 67.2% Pass@3 on AndroidWorld, close to the closed-data GUI-specialized GUI-Owl-1.5-8B base model at 69.0%. The MobileForge-adapted ForgeOwl-8B further reaches 77.6% Pass@3 on AndroidWorld and 41.0% success on the out-of-domain MobileWorld GUI-only split, establishing the strongest open-data mobile GUI agent in our evaluation. Code, data, and trained models will be released at https://mobile-forge.github.io/.