MobileForge: 基于分层反馈引导策略优化的移动GUI代理无标注自适应方法
MobileForge: Annotation-Free Adaptation for Mobile GUI Agents with Hierarchical Feedback-Guided Policy Optimization
June 18, 2026
作者: Guangyi Liu, Pengxiang Zhao, Gao Wu, Yiwen Yin, Mading Li, Liang Liu, Congxiao Liu, Zhang Qi, Mengyan Wang, Liang Guo, Yong Liu
cs.AI
摘要
基于多模态大语言模型(MLLM)的移动GUI代理在UI理解和动作执行方面取得了显著进展,但将其适配到实际目标应用仍成本高昂,因为移动应用数量庞大、更新频繁,且难以用人工编写的任务、示范或奖励标签全面覆盖。现有的免标注GUI学习虽减少了人工监督,但缺乏将目标应用探索、课程挖掘、轨迹展开执行与反馈统一整合的基础平台,而策略优化往往依赖孤立的轨迹展开和粗粒度奖励,难以转化为可靠的改进信号。为此,我们提出MobileForge——一种面向移动GUI代理的免标注适配系统。MobileForge包含两大组件:MobileGym将任务生成和轨迹评估锚定在真实移动应用交互中;层次化反馈引导策略优化(HiFPO)则将轨迹结果、步骤级过程反馈和修正提示转化为基于提示上下文的步骤级GRPO更新。仅使用自动生成的免标注适配数据,MobileForge即可将Qwen3-VL-8B在AndroidWorld上的Pass@3提升至67.2%,接近使用闭源数据训练的专有GUI模型GUI-Owl-1.5-8B的69.0%基线。经MobileForge适配后的ForgeOwl-8B模型进一步在AndroidWorld上达到77.6%的Pass@3,在域外移动世界(MobileWorld)GUI-only任务中取得41.0%的成功率,成为我们评测中性能最强的开源数据移动GUI代理。代码、数据和训练模型将在https://mobile-forge.github.io/ 开源。
English
MLLM-based mobile GUI agents have made substantial progress in UI understanding and action execution, but adapting them to real target apps remains costly because mobile apps are numerous, frequently updated, and hard to cover with human-written tasks, demonstrations, or reward labels. Existing annotation-free GUI learning reduces manual supervision, yet lacks a unified substrate connecting target-app exploration, curriculum mining, rollout execution, and feedback, while policy optimization often relies on isolated rollouts and coarse rewards that are hard to convert into reliable improvement signals. We present MobileForge, an annotation-free adaptation system for mobile GUI agents. MobileForge consists of MobileGym, which grounds task generation and rollout evaluation in real mobile app interaction, and Hierarchical Feedback-Guided Policy Optimization (HiFPO), which turns trajectory outcomes, step-level process feedback, and corrective hints into hint-contextualized step-level GRPO updates. Using only automatically generated annotation-free adaptation data, MobileForge adapts Qwen3-VL-8B to 67.2% Pass@3 on AndroidWorld, close to the closed-data GUI-specialized GUI-Owl-1.5-8B base model at 69.0%. The MobileForge-adapted ForgeOwl-8B further reaches 77.6% Pass@3 on AndroidWorld and 41.0% success on the out-of-domain MobileWorld GUI-only split, establishing the strongest open-data mobile GUI agent in our evaluation. Code, data, and trained models will be released at https://mobile-forge.github.io/.