OpenMobile: タスクと軌跡合成によるオープンなモバイルエージェントの構築

要旨

視覚言語モデルを活用したモバイルエージェントは、モバイルタスクの自動化において顕著な能力を発揮しており、最近の主要モデルは性能の飛躍的向上を達成しています（例：AndroidWorldで約70%の成功率）。しかし、これらのシステムは学習データを非公開とし、タスクおよび軌道合成の手法について不透明なままです。本論文ではOpenMobileを提案します。これは高品質なタスク指示とエージェント軌道を合成するオープンソースフレームワークであり、以下の2つの主要コンポーネントを備えています：（1）探索からグローバル環境メモリを構築し、それを活用して多様で接地された指示を生成する、スケーラブルなタスク合成パイプライン。（2）軌道ロールアウトのためのポリシー切り替え戦略。学習モデルと専門家モデルを交互に切り替えることで、標準的な模倣学習では不足しがちな重要なエラー回復データを取得します。当社のデータで学習したエージェントは、3つの動的モバイルエージェントベンチマークで競争力のある結果を達成しており、特にファインチューニングしたQwen2.5-VLとQwen3-VLは、AndroidWorldでそれぞれ51.7%、64.7%のスコアを記録し、既存のオープンデータアプローチを大幅に上回りました。さらに、合成指示とベンチマークテストセットの重複に関して透明性のある分析を実施し、性能向上がベンチマークへの過剰適合ではなく、広範な機能カバレッジに起因することを検証しました。データギャップを埋め、モバイルエージェント研究の促進を目的として、データとコードをhttps://njucckevin.github.io/openmobile/で公開します。

English

Mobile agents powered by vision-language models have demonstrated impressive capabilities in automating mobile tasks, with recent leading models achieving a marked performance leap, e.g., nearly 70% success on AndroidWorld. However, these systems keep their training data closed and remain opaque about their task and trajectory synthesis recipes. We present OpenMobile, an open-source framework that synthesizes high-quality task instructions and agent trajectories, with two key components: (1) The first is a scalable task synthesis pipeline that constructs a global environment memory from exploration, then leverages it to generate diverse and grounded instructions. and (2) a policy-switching strategy for trajectory rollout. By alternating between learner and expert models, it captures essential error-recovery data often missing in standard imitation learning. Agents trained on our data achieve competitive results across three dynamic mobile agent benchmarks: notably, our fine-tuned Qwen2.5-VL and Qwen3-VL reach 51.7% and 64.7% on AndroidWorld, far surpassing existing open-data approaches. Furthermore, we conduct transparent analyses on the overlap between our synthetic instructions and benchmark test sets, and verify that performance gains stem from broad functionality coverage rather than benchmark overfitting. We release data and code at https://njucckevin.github.io/openmobile/ to bridge the data gap and facilitate broader mobile agent research.

OpenMobile: タスクと軌跡合成によるオープンなモバイルエージェントの構築

OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis

要旨

Support