OpenMobile：通过任务与轨迹合成构建开放移动智能体

摘要

基于视觉语言模型的移动智能体在自动化移动任务方面展现出卓越能力，近期领先模型更实现了性能飞跃——例如在AndroidWorld基准测试中成功率接近70%。然而这些系统始终封闭其训练数据，对任务与轨迹合成方案保持不透明。我们推出开源框架OpenMobile，通过两大核心组件生成高质量任务指令与智能体轨迹：（1）首个可扩展的任务合成管道，通过环境探索构建全局环境记忆库，进而生成多样化且接地气的任务指令；（2）策略切换式轨迹推演方案，通过学习者与专家模型的交替运行，捕获标准模仿学习常缺失的关键错误恢复数据。基于我们数据训练的智能体在三大动态移动智能体基准测试中均取得竞争力结果：值得注意的是，微调后的Qwen2.5-VL与Qwen3-VL在AndroidWorld上分别达到51.7%和64.7%的成功率，远超现有开源方案。我们进一步对合成指令与基准测试集的重叠度进行透明化分析，证实性能提升源于广泛的功能覆盖而非基准过拟合。项目已开源数据与代码（https://njucckevin.github.io/openmobile/），旨在弥合数据鸿沟并推动移动智能体研究的广泛发展。

English

Mobile agents powered by vision-language models have demonstrated impressive capabilities in automating mobile tasks, with recent leading models achieving a marked performance leap, e.g., nearly 70% success on AndroidWorld. However, these systems keep their training data closed and remain opaque about their task and trajectory synthesis recipes. We present OpenMobile, an open-source framework that synthesizes high-quality task instructions and agent trajectories, with two key components: (1) The first is a scalable task synthesis pipeline that constructs a global environment memory from exploration, then leverages it to generate diverse and grounded instructions. and (2) a policy-switching strategy for trajectory rollout. By alternating between learner and expert models, it captures essential error-recovery data often missing in standard imitation learning. Agents trained on our data achieve competitive results across three dynamic mobile agent benchmarks: notably, our fine-tuned Qwen2.5-VL and Qwen3-VL reach 51.7% and 64.7% on AndroidWorld, far surpassing existing open-data approaches. Furthermore, we conduct transparent analyses on the overlap between our synthetic instructions and benchmark test sets, and verify that performance gains stem from broad functionality coverage rather than benchmark overfitting. We release data and code at https://njucckevin.github.io/openmobile/ to bridge the data gap and facilitate broader mobile agent research.