OpenMobile:透過任務與軌跡合成構建開放式行動代理
OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis
April 16, 2026
作者: Kanzhi Cheng, Zehao Li, Zheng Ma, Nuo Chen, Jialin Cao, Qiushi Sun, Zichen Ding, Fangzhi Xu, Hang Yan, Jiajun Chen, Anh Tuan Luu, Jianbing Zhang, Lewei Lu, Dahua Lin
cs.AI
摘要
基於視覺語言模型的移動智能體在自動化移動任務方面展現出卓越能力,近期領先模型更實現了性能飛躍——例如在AndroidWorld基準測試中達成近70%的成功率。然而這些系統始終封閉其訓練數據,並對任務與軌跡合成方案諱莫如深。我們提出OpenMobile這一開源框架,通過兩大核心組件生成高質量任務指令與智能體軌跡:(1) 可擴展的任務合成管道,先通過環境探索構建全局記憶庫,再據此生成多樣化且具實境基礎的指令;(2) 軌跡推演中的策略切換機制,通過學習者與專家模型的交替執行,捕捉標準模仿學習中常缺失的關鍵錯誤恢復數據。使用本數據訓練的智能體在三大動態移動智能體基準測試中均取得競爭力結果:值得注意的是,經我們微調的Qwen2.5-VL與Qwen3-VL在AndroidWorld上分別達到51.7%和64.7%的成績,遠超現有開源數據方案。我們還對合成指令與基準測試集的重疊度進行透明化分析,證實性能提升源於廣泛的功能覆蓋度而非基準過擬合。現已通過https://njucckevin.github.io/openmobile/ 開源數據與代碼,以彌合數據鴻溝並推動移動智能體研究的廣泛發展。
English
Mobile agents powered by vision-language models have demonstrated impressive capabilities in automating mobile tasks, with recent leading models achieving a marked performance leap, e.g., nearly 70% success on AndroidWorld. However, these systems keep their training data closed and remain opaque about their task and trajectory synthesis recipes. We present OpenMobile, an open-source framework that synthesizes high-quality task instructions and agent trajectories, with two key components: (1) The first is a scalable task synthesis pipeline that constructs a global environment memory from exploration, then leverages it to generate diverse and grounded instructions. and (2) a policy-switching strategy for trajectory rollout. By alternating between learner and expert models, it captures essential error-recovery data often missing in standard imitation learning. Agents trained on our data achieve competitive results across three dynamic mobile agent benchmarks: notably, our fine-tuned Qwen2.5-VL and Qwen3-VL reach 51.7% and 64.7% on AndroidWorld, far surpassing existing open-data approaches. Furthermore, we conduct transparent analyses on the overlap between our synthetic instructions and benchmark test sets, and verify that performance gains stem from broad functionality coverage rather than benchmark overfitting. We release data and code at https://njucckevin.github.io/openmobile/ to bridge the data gap and facilitate broader mobile agent research.